Situation | Example | Recommended strategy | Sample |
---|---|---|---|
System determines that recognized user input is invalid. | Incorrect number of digits in a social security number. | State the problem (without blaming the user because the problem might be in the user's utterance or in the system's interpretation of that utterance) and reprompt. Consider directing the user to make the second entry with the keypad. | System: I didn't get nine digits. Please use your keypad to enter your social security number. |
A recognition error occurs while the user is making choices along a menu path or completing items in a form. | User is paying a telephone bill. | Feed recognized input forward into next prompt. Be sure to include information about the Go Back command in first level help. | System: Pay how much? User: $43.15 System: Paying $53.50 with electronic check or credit card? User: No, that's not right. To change the previous entry, say Go Back. To continue select electronic check or credit card. |
User did not hear all of the information presented. | User distracted during the presentation of a menu. | Include the Repeat command as an always active command in the introduction. Following a 1500 to 2500 ms pause, play the always active commands after primary menu prompts. This is especially useful at task terminal points. Provide the options again in the first level help. Repeat the set of always active options in the second level help, then repeat the options again. | System: Say Make Payment, Account Balance, Become new customer, make purchase or Customer Service. User: [the user was distracted and says nothing] System: <2000 ms pause> At any time you can say Help, Repeat, Go Back, Main Menu or Exit. User: Repeat. |
In applications that have disabled barge-in, the user might cause an error by speaking before hearing the tone that indicates the system is ready for recognition. If the user continues speaking over the tone and into the recognition timeframe. This is called a spoke-too-soon (STS) incident (illustrated in Figure 1). Because a portion of the user utterance was spoken outside of the recognition timeframe, the input that the speech recognition engine actually received often does not match anything in the active grammars, so the speech recognition engine treats it as an out-of-grammar utterance (triggering a nomatch event).
It is also possible for a user to finish speaking before the tone sounds; this is called a spoke-way-too-soon (SWTS) incident (illustrated in Figure 2). Because the entire user utterance occurs outside of the recognition timeframe, the speech recognition engine does not actually receive any input and the system will generally time out (triggering a noinput event) as it waits for the input that the user already gave.
Spoke-too-soon and spoke-way-too-soon incidents can be a common source of speech recognition errors, especially in barge-in disabled applications with excessively long prompts. Here are some things you can do to minimize recognition errors associated with these incidents:
These incidents can become more frequent as users become expert with a half-duplex system and rush to provide input. A spoke-too-soon or spoke-way-too-soon incident almost never indicates a problem with the user understanding the system; therefore, the appropriate system response is to let the user get back to the application as quickly as possible, with a reminder to wait for the tone. For example:
|
If an application in which barge-in has been disabled returns a nomatch or noinput event, you can not be sure whether it was truly an out-of-grammar utterance or timeout, or whether it was caused by a spoke-too-soon or spoke-way-too-soon incident. Consequently, you should include early in the self-revealing help sequence a reminder to wait for the tone, along with whatever other help you want to provide for these return codes; this “wait for the tone” reminder is not necessary when the user said a phrase from the Help grammar.
|
|
|
|
|
|
|
It is possible to use information about confidence levels and n-best lists to refine confirmation and error correction strategies. Confidence levels are values produced by speech recognition engines in which utterances with close matches to recognized words get higher scores. n-best lists are the top n matches produced by a speech recognizer and ranked by confidence level. The match with the highest confidence level is the one that the engine returns to the application.
|
|
|