Designing the dialog

Customer satisfaction with your voice response service will depend on a number of factors, including the voice you choose, the information you provide, and the ease with which callers can get the information they want. The sound and feel of the service is every bit as important as the look and feel of a visual computer application. With careful design, the caller’s experience will be pleasurable; with lack of attention to dialog design, the caller will get frustrated and hang up. At best this will result in your people handling just as many calls as they ever did; at worst it may mean loss of business.

You’re already considering a voice response service, so you’re aware of the strengths of voice response: telephones are ubiquitous, familiar, and easy-to-use and an automated system can provide more speed, privacy, efficiency, availability, and accuracy than some human operators, and at lower cost.

Nevertheless, when designing the dialog, you need to be aware of some of the limitations of the medium. Compared with people, automated systems are inflexible, demanding, and intolerant of deviations from the expected conversation. They are best at handling very routine calls. Almost without exception, all voice response services should offer help in some way, for example by offering the choice of transferring to an operator (if call transfer is provided by the switch) or calling another number to speak to an operator. If call transfer is provided by the switch, the application can automatically transfer the caller to a human agent in some circumstances.

You should give human factors and usability a high priority when designing voice response services, in addition to testing the usability of the services later.

Design considerations

Compared with today’s multiwindowed screen-based computer applications, voice response applications have a number of potential limitations. With careful design, you can overcome the limitations and make your application a delight to use:

On the input side, from the caller’s point of view, the telephone is an inferior device compared with the computer keyboard and mouse:

Faced with all these challenges, you have a number of design decisions to make. First you need to determine the right dialog style or styles and then you need to attend to the detail of each interaction, or task, within the application. If you design your application well, callers will prefer it to the human agent, and the reward will be savings in business costs.

Choosing a dialog style

No single dialog style is best for all applications, for all tasks within an application, or for all callers:

There is a far greater choice of dialog designs for voice response services than you might imagine. Some less common designs overcome the problems encountered by callers in using traditional voice response services. Different styles of dialog may be appropriate in different parts of the service, depending on the task being performed: after all, these days one expects to see pull-down menus, radio buttons, check boxes, and so on, used in visual applications, so it’s not surprising that there might be a variety of techniques you can use in voice response dialogs.

There are three basic styles of voice response dialog, suited to different types of task:

Menu:
suited to selecting one of a small number of options
List:
suited to choosing multiple items, perhaps from a large number
Form:
suited to providing input such as addresses and telephone numbers

Within these basic styles, there are numerous variations. It’s hard to provide rules for good dialog design, but you need to classify your application and your callers in these terms before considering the following questions:

  1. Composite or separate actions? Composite actions may be simpler for the caller, but separate actions gives more flexibility. If your callers are unlikely to use the application often, give them composite actions that accomplish the most with the fewest keystrokes. If callers are likely to use the application often, however, they may want to combine actions in different ways.
  2. Keys or speech recognition? The application is using speech to communicate with the caller, but should the caller be using speech or keys? Remember it takes longer to speak a command and have it recognized than it does to press a key. Also, people are better at pressing keys accurately than speaking clearly enough to be recognized accurately.
  3. A mixture of key and speech input? You can mix key input and speech input in a single application (though you are not recommended to allow both at the same time). For example, entering a Personal Identification Number (PIN) is easier using keys, whereas recording a street address is easier with speech.
  4. Command-driven or prompted? You need to make a decision about whether to let callers interrupt the prompts or not. In general, you should let them interrupt. Once they learn the choices at each point, they can key ahead (or speak ahead), without waiting for the prompt to finish. There may, however be some prompts that you want to force play to the end; there may even be whole applications in which you want all prompts to be force played.

    The type of dialog that allows key-ahead or speak-ahead is sometimes known as a command-driven dialog, but you still need to play the prompts in a voice application, because of the absence of documentation.

  5. A single selection key or option-specific selection keys? With a single selection key, you play each choice to the caller and give them a few seconds to press the nominated key (for example, 1); if they do not press a key, you play the next choice, and so on. The caller always presses 1 to select the current option. There is less for callers to learn if, at any point, they have only two choices: to select the current option or to proceed to the next; it can be useful for long lists of options (for example film titles). On the other hand, this kind of design tends to restrict the caller’s ability to key ahead and bypass menus.

    With option-specific selection keys, you allocate a different key to each option. This has the advantage of allowing callers to key ahead, but limits the options available at any one time. It can also be difficult to ensure consistency of key-allocation throughout the application (for example, on one menu, 3 may be “Delete message” while on another menu, it is something less destructive). Again, infrequent use may indicate the single selection key and frequent use the option-specific selection keys.

  6. Passive or active advance through menus? Should each option “drop through” to the next option, or should the caller have to press a key to proceed? This becomes an issue if you provide a single selection key. Again, passive advance is probably most suitable for callers who use the application infrequently but, to avoid the problem of callers selecting an option just after the menu has moved on to the next option, you need to include a few seconds of silence between each option.

    Passive advance might seem easiest for the caller, but only at first. Later, callers may become frustrated at having to sit through whole menus. Providing a “Next” key would seem to prevent this, but the “drop-through” behavior often results in callers never learning about the “Next” key, unless the menus mention it.

  7. Long or short prompts? The more information you provide, the more likely the caller is to learn how to drive the application. However, long prompts slow down callers, particularly experienced ones. A choice of novice and expert prompts can help. The difference between them is not necessarily verbose versus terse: you could actually leave some information out of the expert prompts altogether.

    Always provide essential information before information that is merely helpful. Provide “Next” and “Previous” keys, so that callers can skip over information they don’t want to hear.