Customer satisfaction with your voice response service will depend on a
number of factors, including the voice you choose, the information you provide,
and the ease with which callers can get the information they want. The sound
and feel of the service is every bit as important as the look and feel of
a visual computer application. With careful design, the caller’s experience
will be pleasurable; with lack of attention to dialog design, the caller will
get frustrated and hang up. At best this will result in your people handling
just as many calls as they ever did; at worst it may mean loss of business.
You’re already considering a voice response service, so you’re
aware of the strengths of voice response: telephones are ubiquitous, familiar,
and easy-to-use and an automated system can provide more speed, privacy, efficiency,
availability, and accuracy than some human operators, and at lower cost.
Nevertheless, when designing the dialog, you need to be aware of some of
the limitations of the medium. Compared with people, automated systems are
inflexible, demanding, and intolerant of deviations from the expected conversation.
They are best at handling very routine calls. Almost without exception, all
voice response services should offer help in some way, for example by offering
the choice of transferring to an operator (if call transfer is provided by
the switch) or calling another number to speak to an operator. If call transfer
is provided by the switch, the application can automatically transfer the
caller to a human agent in some circumstances.
You should give human factors and usability a high priority when designing
voice response services, in addition to testing the usability of the services
later.
Design considerations
Compared with today’s multiwindowed screen-based computer applications,
voice response applications have a number of potential limitations. With careful
design, you can overcome the limitations and make your application a delight
to use:
- Auditory output is sequential, and hard to keep in short-term memory:
you have to take great care wording the prompts.
- Auditory output can be slow: unfortunately, someone (either you or the
caller) is paying for the call, and wants the whole transaction to take as
little time as possible: again, the prompts need to be worded carefully so
that they convey the greatest amount of information, unambiguously, and in
the shortest amount of time.
- The very ubiquity and availability of telephones means that callers don’t
have access to manuals and other supporting materials: the whole point is
that the voice response service can be used from anywhere, so the application
itself must be completely self-documenting.
- Voice data takes up a lot of storage space (one second of voice occupies
8000 bytes of storage). This can be reduced by a factor of 5 by compression,
or even further by generating synthesized speech from text.
On the input side, from the caller’s point of view, the telephone is
an inferior device compared with the computer keyboard and mouse:
- There are two ways of communicating with a Blueworx Voice Response application: by key
or by voice. If keys are used, they must transmit dual-tone multifrequency
(DTMF) tones. If the caller is able to transmit these tones, keys are the
most accurate and efficient method of input.
- Bear in mind that many callers may be using phones with an integral keypad
and handset: they will take longer to press the keys than callers using traditional
phones with a separate keypad.
- Compared with the 100 or so keys on a computer keyboard, the standard
telephone has twelve keys (Blueworx Voice Response supports up to sixteen: 0 through 9, *, #,
A, B, C, and D).
- The standard keypad is fine for numeric input, making simple choices,
and answering yes-no questions, but what about alphabetic input? In some countries,
there are no alphabetic characters on the keys at all; even in countries where
telephones still have the alphabetic characters on the number keys, people
don’t use them enough to be familiar with their positions. (And note that
different countries use different layouts anyway.)
- Many telephones cannot transmit DTMF tones reliably or at all: rotary
dial, hybrid tone, and dial-pulse phones are still in use; cellular and portable
phones often suffer from noise interference; some equipment sends tones of
fixed duration or cannot send tones at all during a call. In these cases,
the only reasonable input device is voice, and for that, you require access
to speech recognition hardware or software.
Faced with all these challenges, you have a number of design decisions
to make. First you need to determine the right dialog style or styles and
then you need to attend to the detail of each interaction, or task, within
the application. If you design your application well, callers will prefer
it to the human agent, and the reward will be savings in business costs.
Choosing a dialog style
No single dialog style is best for all applications, for all tasks within
an application, or for all callers:
- Callers may be familiar or unfamiliar with the mechanics of driving the
application.
- Callers may be familiar or unfamiliar with other voice response services.
- Callers may use the service frequently (and therefore acquire expertise
with the mechanics, or the content, over time) or infrequently (never acquiring
expertise).
- The content of the dialog may be familiar and predictable (for example,
days of the week) or relatively unfamiliar and unpredictable (for example,
toppings available for pizzas).
- It may be appropriate to provide documentation
or training sessions, for example, if callers are employee or students, but
it is unreasonable to base your design on the assumption that documentation
will be read or training sessions attended.
There is a far greater choice of dialog designs for voice response services
than you might imagine. Some less common designs overcome the problems encountered
by callers in using traditional voice response services. Different styles
of dialog may be appropriate in different parts of the service, depending
on the task being performed: after all, these days one expects to see pull-down
menus, radio buttons, check boxes, and so on, used in visual applications,
so it’s not surprising that there might be a variety of techniques you
can use in voice response dialogs.
There are three basic styles of voice response dialog, suited to different
types of task:
- Menu:
- suited to selecting one of a small number of options
- List:
- suited to choosing multiple items, perhaps from a large number
- Form:
- suited to providing input such as addresses and telephone numbers
Within these basic styles, there are numerous variations. It’s hard
to provide rules for good dialog design, but you need to classify your application
and your callers in these terms before considering the following questions:
- Composite or separate actions? Composite actions may be simpler
for the caller, but separate actions gives more flexibility. If your callers
are unlikely to use the application often, give them composite actions that
accomplish the most with the fewest keystrokes. If callers are likely to use
the application often, however, they may want to combine actions in different
ways.
- Keys or speech recognition? The application is using speech to
communicate with the caller, but should the caller be using speech or keys?
Remember it takes longer to speak a command and have it recognized than it
does to press a key. Also, people are better at pressing keys accurately than
speaking clearly enough to be recognized accurately.
- A mixture of key and speech input? You can mix key input and
speech input in a single application (though you are not recommended to allow
both at the same time). For example, entering a Personal Identification Number
(PIN) is easier using keys, whereas recording a street address is easier with
speech.
- Command-driven or prompted? You need to make a decision about
whether to let callers interrupt the prompts or not. In general, you should
let them interrupt. Once they learn the choices at each point, they can key
ahead (or speak ahead), without waiting for the prompt to finish. There may,
however be some prompts that you want to force play to the end; there
may even be whole applications in which you want all prompts to be force played.
The type of dialog that allows key-ahead or speak-ahead is sometimes known
as a command-driven dialog, but you still need to play the prompts in a voice
application, because of the absence of documentation.
- A single selection key or option-specific selection keys? With
a single selection key, you play each choice to the caller and give them a
few seconds to press the nominated key (for example, 1); if they do not press
a key, you play the next choice, and so on. The caller always presses 1 to
select the current option. There is less for callers to learn if, at any point,
they have only two choices: to select the current option or to proceed to
the next; it can be useful for long lists of options (for example film titles).
On the other hand, this kind of design tends to restrict the caller’s
ability to key ahead and bypass menus.
With option-specific selection keys,
you allocate a different key to each option. This has the advantage of allowing
callers to key ahead, but limits the options available at any one time. It
can also be difficult to ensure consistency of key-allocation throughout the
application (for example, on one menu, 3 may be “Delete message” while
on another menu, it is something less destructive). Again, infrequent use
may indicate the single selection key and frequent use the option-specific
selection keys.
- Passive or active advance through menus? Should each option “drop
through” to the next option, or should the caller have to press a key
to proceed? This becomes an issue if you provide a single selection key. Again,
passive advance is probably most suitable for callers who use the application
infrequently but, to avoid the problem of callers selecting an option just
after the menu has moved on to the next option, you need to include a few
seconds of silence between each option.
Passive advance might seem easiest
for the caller, but only at first. Later, callers may become frustrated at
having to sit through whole menus. Providing a “Next” key would seem
to prevent this, but the “drop-through” behavior often results in
callers never learning about the “Next” key, unless the menus mention
it.
- Long or short prompts? The more information you provide, the
more likely the caller is to learn how to drive the application. However,
long prompts slow down callers, particularly experienced ones. A choice of
novice and expert prompts can help. The difference between them is not necessarily
verbose versus terse: you could actually leave some information out of the
expert prompts altogether.
Always provide essential information before information
that is merely helpful. Provide “Next” and “Previous” keys,
so that callers can skip over information they don’t want to hear.