wording prompts for speech recognition applications

Like DTMF applications, your instructions to the caller must be brief. The caller cannot be expected to remember more than two or three instructions before making a response. Here are some hints and tips.

Tell the caller what to say to avoid unrecognizable words; that is, words not in your vocabulary. For example, you want your caller to answer “yes,” not “OK” or “Sure.” So your prompt should be worded “If you want to order a pizza, say YES now” rather than “Do you want a pizza?”
To encourage the caller to speak digits rather than whole numbers, you might provide an example. Your recorded voice segment must speak in digits. “How many bottles of item one seven five do you want?” is better than “How many bottles of item one hundred and seventy five do you want?”
It is useful to include directions such as “say YES or NO now” in the prompt. This helps callers understand that they must wait for the prompt to be over before they can speak.
Your prompts are models for the caller. That is, the caller is likely to mimic the volume, pace, enunciation, and terseness of the recorded speech.

Handling all responses

When designing a speech-recognition application, you must decide when to repeat a prompt, request verification, or transfer to a human operator. Your decisions are based on confidence factors.

Ask for the same information not more than twice, to avoid irritating the caller and jeopardizing the business transaction. The second prompt should apologize to the caller, accept responsibility for the communication error, and repeat the request. You might change the wording in the second request to give additional clues or information.

Speech recognition while a prompt is playing

Full-duplex barge-in allows the caller to start speaking while the prompt is still playing. See Barge-in.

Interrupting prompts

To enable callers without DTMF phones to interrupt a prompt with a short speech utterance (for example “stop!”), you need to enable voice interrupt detection. See Voice interrupt detection.

Limitations

Speech-recognition applications work best when there is a high signal-to-noise ratio. The speech recognition process can fail when background noise, such as traffic, is louder than the caller’s speech.

If you expect many of your customers to be calling from a noisy environment, your application should include a test and branch to a human operator. For example, a first prompt asks callers if they have their account number ready. If the external voice services system fails to find a good match, the state table could branch to a TransferCall action to transfer the call.

Wording prompts for speech recognition applications

Handling all responses

Speech recognition while a prompt is playing

Interrupting prompts

Limitations