Error recovery and confirming user input

Error recovery

Error recovery in speech applications is very important because speech is transient and speech applications rarely have written user documentation. Self revealing help can be used to address many error recovery situations (see Choosing help mode or self-revealing help). Additional strategies are shown in Table 1.
Table 1. Error-recovery techniques
Situation Example Recommended strategy Sample
System determines that recognized user input is invalid. Incorrect number of digits in a social security number. State the problem (without blaming the user because the problem might be in the user's utterance or in the system's interpretation of that utterance) and reprompt. Consider directing the user to make the second entry with the keypad. System: I didn't get nine digits. Please use your keypad to enter your social security number.
A recognition error occurs while the user is making choices along a menu path or completing items in a form. User is paying a telephone bill. Feed recognized input forward into next prompt. Be sure to include information about the Go Back command in first level help.

System: Pay how much?

User: $43.15

System: Paying $53.50 with electronic check or credit card?

User: No, that's not right.

To change the previous entry, say Go Back. To continue select electronic check or credit card.

User did not hear all of the information presented. User distracted during the presentation of a menu. Include the Repeat command as an always active command in the introduction. Following a 1500 to 2500 ms pause, play the always active commands after primary menu prompts. This is especially useful at task terminal points. Provide the options again in the first level help. Repeat the set of always active options in the second level help, then repeat the options again.

System: Say Make Payment, Account Balance, Become new customer, make purchase or Customer Service.

User: [the user was distracted and says nothing]

System: <2000 ms pause> At any time you can say Help, Repeat, Go Back, Main Menu or Exit.

User: Repeat.

Understanding spoke-too-soon and spoke-way-too-soon incidents

In applications that have disabled barge-in, the user might cause an error by speaking before hearing the tone that indicates the system is ready for recognition. If the user continues speaking over the tone and into the recognition timeframe. This is called a spoke-too-soon (STS) incident (illustrated in Figure 1). Because a portion of the user utterance was spoken outside of the recognition timeframe, the input that the speech recognition engine actually received often does not match anything in the active grammars, so the speech recognition engine treats it as an out-of-grammar utterance (triggering a nomatch event).

Figure 1. Spoke-too-soon (STS) incident
This image shows a time line beginning with the playing of an audio prompt and continuing past the point when the caller or user stops speaking. This occurs when the user starts speaking before the turn-taking tone, which signals to start speaking, has been played.

It is also possible for a user to finish speaking before the tone sounds; this is called a spoke-way-too-soon (SWTS) incident (illustrated in Figure 2). Because the entire user utterance occurs outside of the recognition timeframe, the speech recognition engine does not actually receive any input and the system will generally time out (triggering a noinput event) as it waits for the input that the user already gave.

Figure 2. Spoke-way-too-soon (SWTS) incident
This image shows a time line beginning with the playing of an audio prompt and continuing past the point when the caller or user stops speaking. This occurs when the user starts and finishes speaking before the turn-taking tone.
Minimizing STS and SWTS incidents:

Spoke-too-soon and spoke-way-too-soon incidents can be a common source of speech recognition errors, especially in barge-in disabled applications with excessively long prompts. Here are some things you can do to minimize recognition errors associated with these incidents:

  • Control the length of your system prompts to prevent them from being excessively long.
  • Aggressively trim silence from the end of prerecorded audio files to prevent time lapses between the end of the prompt and the presentation of the tone.
  • Place the “Please wait for the tone” prompt at the end of the introductory message, just before you request the initial input from the user.
Recovering from STS and SWTS incidents:

These incidents can become more frequent as users become expert with a half-duplex system and rush to provide input. A spoke-too-soon or spoke-way-too-soon incident almost never indicates a problem with the user understanding the system; therefore, the appropriate system response is to let the user get back to the application as quickly as possible, with a reminder to wait for the tone. For example:

System:

Do you want to do another transaction?

User:

(interrupting the tone) Yes.

System:

Please say yes or no.

User:

(speaking completely over the prompt, resulting in silence timeout)Yes.

System:

Remember to speak after the tone. Please say “yes” or “no.”

Implications for self-revealing help when barge-in has been disabled:

If an application in which barge-in has been disabled returns a nomatch or noinput event, you can not be sure whether it was truly an out-of-grammar utterance or timeout, or whether it was caused by a spoke-too-soon or spoke-way-too-soon incident. Consequently, you should include early in the self-revealing help sequence a reminder to wait for the tone, along with whatever other help you want to provide for these return codes; this “wait for the tone” reminder is not necessary when the user said a phrase from the Help grammar.

Confirming user input

You should always ask for user confirmation if the system is about to perform an action it cannot undo. For example:
System:

Preparing to transfer $500 from checking to savings.
Please say Transfer or Cancel.

For other data entry situations, there are a number of confirmation strategies available. These strategies fall into the general categories of immediate and delayed (or batch) confirmation.
Immediate confirmation: separate dialog turns:
The simplest strategy is to provide a separate dialog turn for each item. The advantage of this approach is that there is no ambiguity and users enjoy a very high success rate. The disadvantage is that the additional dialog turns make the call longer and can lead to an interaction that feels sluggish. The more items there are, the more sluggish the interaction feels. For example:
System: Which fund?
User: Fidelity Magellan
System: Was that Fidelity Magellan?
User: yes
Immediate confirmation: feed forward:
Another approach to immediate confirmation is to feed the data forward into the next prompt. The advantage of this approach is that it uses implicit confirmation to save a dialog step. The disadvantages are that the user action to disconfirm is not evident and it is trickier to develop this type of prompt when you are using recordings rather than TTS. For example:
System: Which fund?
User: Fidelity Magellan
System: Buy how many shares of Fidelity Magellan?
User: 100 shares
Delayed (batch) confirmation: basic:
The basic delayed confirmation strategy is to play all collected data in a single step, correct any incorrect items, then repeat the confirmation. The advantage of this approach is that if there are no errors, then all confirmation is done in a single step. The disadvantage is that if there are more than two errors, the correction process is very lengthy. The following example illustrates the resulting interaction if there are no errors:
System: Which fund?
User: Fidelity Magellan
System: Buy how many shares?
User: 100
System: Was that 100 shares of Fidelity Magellan?
However, if there are two errors, the resulting dialog is:
System: Which fund?
User: Fidelity Magellan
System: Buy how many shares?
User: 100
System: Was that 200 shares of Fidelity No Load?
User: No
System: Which item do you want to change?
User: The number of shares.
System: To how many shares?
User: 100
System: Was that 100 shares of Fidelity No Load?
User: No
System: Which item do you want to change?
User: The fund
System: Change to which fund?
User: Fidelity Magellan
System: Was that 100 shares of Fidelity Magellan?
User: Yes
The more items there are in the confirmation message, the more pronounced this problem becomes.
Delayed (batch) confirmation: improved:
To achieve the advantages of delayed confirmation while eliminating the disadvantages of the basic strategy, there are several alternative delayed confirmation methods. The most important aspect of the improved methods is the elimination of the repeated playing of the full confirmation message. If there are no errors, the interaction is the same as that described for the basic delayed confirmation strategy. If there are two errors, the interaction could be:
System: Which fund?
User: Fidelity Magellan
System: Buy how many shares?
User: 100
System: Was that 200 shares of Fidelity No Load?
User: No
System: What do you want to change first?
User: The number of shares.
System: To how many shares?
User: 100
System: Was that 100 shares?
User: Yes
System: Change anything else?
User: Yes, the fund
System: Change to which fund?
User: Fidelity Magellan
System: Was that Fidelity Magellan?
User: Yes
If full confirmation (all data) is a customer requirement, do not engage in the immediate confirmation steps. Instead the interaction would be:
System: Which fund?
User: Fidelity Magellan
System: Buy how many shares?
User: 100
System: Was that 200 shares of Fidelity No Load?
User: No
System: What do you want to change first?
User: The number of shares.
System: To how many shares?
User: 100
System: Change anything else?
User: Yes, the fund.
System: Change to which fund?
User: Fidelity Magellan
System: Was that 100 shares of Fidelity Magellan?
User: Yes
Refining confirmation and error correction with confidence levels and n-best lists:

It is possible to use information about confidence levels and n-best lists to refine confirmation and error correction strategies. Confidence levels are values produced by speech recognition engines in which utterances with close matches to recognized words get higher scores. n-best lists are the top n matches produced by a speech recognizer and ranked by confidence level. The match with the highest confidence level is the one that the engine returns to the application.

If you are using an immediate confirmation strategy with feed forward, do not put the feedback in the following prompt unless the recognition confidence for the item feeding forward is high. For example, if confidence is high:
System: Which stock?
User: Texaco (recognized with high confidence)
System: How many shares of Texaco?
User: 500
System: Sell 500 shares of Texaco?
User: Yes
But if the recognition confidence is low:
System: Which stock?
User: Texaco (recognized as PepsiCo but with low confidence)
System: How many shares?
User: 500
System: What was that stock?
User: Texaco (recognized this time with high confidence)
System: Sell 500 shares of Texaco?
User: Yes
Another refinement is to use n-best lists for disambiguation when the top candidates in the n-best list have close confidence scores. If you can apply back end logic to disambiguate the candidates then do so. If this isn't possible, provide a disambiguation dialog turn. For example:
System: Do you want to buy Texaco?
User: No.
System: Buy PepsiCo?
User: Yes
You can use the same approach for disambiguating homophones (such as Cisco and Sysco). Another refinement is to program the system so it will not make repeated recognition errors. One way to do this is the use n-best query instead of asking for repetition. This works well when top candidates in an n-best list have high recognition, but will not work well if the utterance was out of grammar.
To avoid repeated recognition errors:
  • Reject a first choice that the user has already refused and present the second item in the n-best list instead.
  • Use n-best or conditional probabilities to support correction for data entered with voice spelling (see Tips for voice spelling).
  • If there's no match at the back end, create likely combinations and check to see if one or more of them matches.
  • If no back end comparisons are possible, use lists to guide error recovery.