Using simple or natural command grammars

The VoiceXML browser uses grammar-based speech recognition, as explained in Grammars.

Grammars can be very simple or extremely complex, as explained in Table 1. The appropriate type to use depends on your application and user characteristics.

Table 1. Simple versus natural command grammars
Type of grammar Description Advantages Disadvantages
Simple Grammar A grammar that includes basic words and phrases that a user might reasonably be expected to say in response to a directed prompt. Many applications will not require complex grammars to be effective, as long as the system prompts constrain likely user input to the valid responses specified by the grammar. Easier to code and maintain.

When used with properly worded prompts, can be as effective as much more complex grammars.

When taken to the extreme, may impose excessive restrictions on what users can say.

For example, although you can code an application that requires users to respond to every prompt with a “YES” or a “NO,” this results in a cumbersome interface for all but the simplest of applications.

Natural Command Grammar A very complex statistical grammar that approaches natural language understanding (NLU) in its lexical and syntactic flexibility. Can enhance the ability of the system to recognize what users are saying.

Can increase dialog efficiency. See Using menu flattening (multiple tokens in a single user utterance).

Time-consuming to build and more difficult to maintain.

Uses more system resources, possibly impacting performance.

Designing simple grammars

Simple grammars do not attempt to cover all possible ways that a user could respond; rather, you word your prompts in a way that guides users to speak one of a reasonably-sized list of responses, and you code your grammars to accept those responses.

The number and types of responses you will want to support (and therefore, the size and complexity of your grammar) depend on your application and your users.

Evaluating the need for natural command grammars or natural language understanding (NLU)

New SUI designers often start out by assuming that unless a system has true, statistical NLU or NLU-like capability, it will not be usable. This assumption is simply not correct. Well-designed prompts can focus user input so that a fairly small grammar has an excellent chance of matching the user input.

NLU call routing

An emerging area in which NLU applications have seen considerable success is NLU call routing. Callers respond to prompts such as "How may I help you?" and the application uses a number of statistical models to interpret the caller's response and to route the call appropriately. This approach is especially effective if there are many potential destinations and no way to efficiently or clearly group them into categories.

Using menu flattening (multiple tokens in a single user utterance)

NLU applications also have the potential for substantial menu flattening because the system can parse the user input and extract multiple tokens (where a token is the smallest unit of meaningful linguistic input), rather than requiring the user to provide these tokens one at a time. For example, if a user says to a travel reservations application, “Tell me all the flights from Miami to Atlanta for tomorrow before noon,” there are at least six tokens: all (rather than one or two), flights (rather than bus trips or trains), Miami (departure point), Atlanta (destination), tomorrow (date), before noon (time).

To take full advantage of the increased efficiency that menu flattening provides, the system prompts must encourage users to provide input with multiple tokens. Therefore, appropriate prompts for a natural command or NLU application are quite different from appropriate prompts for simple grammar systems. One way to do this is to provide a nondirective prompt with self-revealing help messages that contain examples of valid multiple-token commands. For details, see Managing nondirective prompts.