The VoiceXML browser uses grammar-based speech recognition, as explained in Grammars.
Grammars can be very simple or extremely complex, as explained in Table 1. The appropriate type to use depends on your application and user characteristics.
Type of grammar | Description | Advantages | Disadvantages |
---|---|---|---|
Simple Grammar | A grammar that includes basic words and phrases that a user might reasonably be expected to say in response to a directed prompt. Many applications will not require complex grammars to be effective, as long as the system prompts constrain likely user input to the valid responses specified by the grammar. | Easier to code and maintain.
When used with properly worded prompts, can be as effective as much more complex grammars. |
When taken to the extreme, may impose excessive restrictions
on what users can say.
For example, although you can code an application that requires users to respond to every prompt with a “YES” or a “NO,” this results in a cumbersome interface for all but the simplest of applications. |
Natural Command Grammar | A very complex statistical grammar that approaches natural language understanding (NLU) in its lexical and syntactic flexibility. | Can enhance the ability of the system to recognize what
users are saying.
Can increase dialog efficiency. See Using menu flattening (multiple tokens in a single user utterance). |
Time-consuming to build and more difficult to maintain.
Uses more system resources, possibly impacting performance. |
Simple grammars do not attempt to cover all possible ways that a user could respond; rather, you word your prompts in a way that guides users to speak one of a reasonably-sized list of responses, and you code your grammars to accept those responses.
The number and types of responses you will want to support (and therefore, the size and complexity of your grammar) depend on your application and your users.
New SUI designers often start out by assuming that unless a system has true, statistical NLU or NLU-like capability, it will not be usable. This assumption is simply not correct. Well-designed prompts can focus user input so that a fairly small grammar has an excellent chance of matching the user input.
An emerging area in which NLU applications have seen considerable success is NLU call routing. Callers respond to prompts such as "How may I help you?" and the application uses a number of statistical models to interpret the caller's response and to route the call appropriately. This approach is especially effective if there are many potential destinations and no way to efficiently or clearly group them into categories.