Applications that mix speech and DTMF are called mixed-mode applications. Because speech applications and DTMF applications do not typically have the same “sound and feel,” it can be tricky to mix the two. For this reason, it is generally better not to explicitly design mixed-mode applications, unless you are migrating from a legacy DTMF application to a speech-enabled version.
Your system prompts should generally not attempt to mention both speech and DTMF. The application interface will be simpler if your prompts are focused primarily on speech.
If you must mix modes, one of your first design decisions must be to choose the fundamental architecture of the mix. There are four choices, as described in Table 1:
Type of mix | Advantages | Disadvantages |
---|---|---|
Speech as the primary interface, with DTMF support when beneficial or required | Seamless mixed-mode user interface. Can code to move user immediately back to speech (bounce style) or stay in DTMF mode (sticky style). Makes best use of both technologies. |
More time-consuming to develop than a speech-only interface. |
Completely separate speech and DTMF interfaces. Users make the choice as their first interaction with the system. | Most straightforward approach. | Requires you to maintain two separate applications.
Both applications require full functionality. Users can't switch from one mode to the other. |
A unified system that allows users to freely switch between speech and DTMF modes, perhaps with the use of a DTMF key or key sequence. | Almost as straightforward as separate interfaces.
Users can switch between modes. |
Users must deal with two different interfaces.
Users might experience mode errors—thinking they are in one mode when they are in the other. |
A single application with DTMF-style prompts | Users only experience one interface. | Speech is not used to its full advantage. |
You should certainly use DTMF whenever your customer demands it, and it's a good idea to continue to provide DTMF when you have an existing base of power DTMF users. You may want to consider using DTMF for confirmation of sensitive transactions that users would not want overheard. However, you should keep in mind the difficulty this can pose for users of mobile and rotary pulse telephones, as well as telephones where the keypad is on the handset.
If your application supports both speech and DTMF modes, you may want to switch to DTMF mode automatically if the user is experiencing consistent speech recognition errors. With good error recovery techniques (see Error recovery), it is not generally necessary or desirable to switch to DTMF mode after a single recognition error.
You may also want to consider using a DTMF key sequence to allow the user to switch to DTMF mode; if severe recognition problems are what is causing the user to want to switch modes, providing only a spoken command for mode switching may prove ineffective. If you plan to use DTMF as a backup system for excessive recognition errors or excessive false stopping of prompts, be sure to disable speech for the duration of the call.
When feasible, you may want to move any interactions requiring DTMF input to the beginning of the application. This is especially important for interactions involving secure information; once users start talking to the application, they might continue to do so even when explicitly instructed to press keys. In some cases, this could compromise the information.
You should consider using one word to introduce prompts for a single DTMF digit or command sequence and a different word to prompt for a string of digits. For instance, you might use “Press” to indicate that the user should input a single DTMF digit or command, and “Enter” to indicate that the user should input a string of DTMF digits. Unless there is a reason to do otherwise, provide the functional description before the number to press. For example:
|
|