Deciding whether to use audio formatting

Audio formatting is a useful technique that increases the bandwidth of spoken communication by using speech and non-speech cues to overlay structural and contextual information on spoken output. Audio formatting is analogous to the well-understood notion of visual formatting, where designers use font changes, indenting, and other aspects of visual layout to cue the reader to the underlying structure of the information being presented.

Designing audio tones

A disadvantage to using audio formatting is that there are not yet standard sounds for specific purposes. If you plan to use audio formatting, you may want to work with an audio designer (analogous to a graphic designer for graphical user interfaces) to establish a set of pleasing and easily discriminated sounds for these purposes.

When designing audio formatting, the tones should be kept short: typically no longer than 0.5–1.0 seconds, and even as short as 75 ms. Shorter tones are generally less obtrusive, so users are more likely to perceive them as useful, rather than distracting.

Applying audio formatting

You can use non-speech cues to indicate dialog state, exceptions to normal system behavior, and content formatting, as described in Table 1.

Table 1. Audio formatting
Purpose Recommendations
Turn-taking tone (barge-in disabled only) If you disable barge-in, you might want to use audio formatting to indicate when it is the user's turn to speak, as described in Table 1. An effective turn-taking tone will generally have the following characteristics:
  • duration of 75–150 ms
  • pitch of 750–1250 kHz
  • not too loud
  • gentle on the ear (a complex wave rather than sinusoid)
Barge-in temporarily disabled When barge-in is temporarily disabled (for example, when legal notices are read), you may want to play a unique background sound or use a special tone or prompt as an indicator.
  • For recorded audio prompts, you will need to prerecord the speech mixed with the background sound.
  • For synthesized (TTS) prompts, you can use an introductory tone or prompt when you disable barge-in, and another tone or prompt to let the user know when you have re-enabled barge-in.
Audio cue for bulleted list Consider using a short sound snippet as an auditory icon.
Audio cue for emphasis (akin to visual bold and italics) Consider using an auditory inflection technique, such as changing volume or pitch.
Audio cue for secure transactions For secure transactions, you may want to play a unique background sound or use a special tone or prompt. See the recommendations for Barge-in temporarily disabled above.
Audio cue for “system busy” (akin to visual hourglass) You can use the fetchaudio attribute to play an audio file when the system is busy fetching documents. The audio file stops playing as soon as the document is retrieved. If you use a ticking tone for “system busy,” use a fairly slow ticking rate (about 1-2 seconds between ticks). Avoid rates that are faster than 1 second per tick. Alternatively, consider playing music when the system is busy.
Note: When users are asked to wait, research has been shown that they will follow the instruction for at least 7 seconds of silence.