Test phase

After you have incorporated the results of the “Wizard of Oz” testing, you will want to code and test a working prototype of the application. During this phase, be sure to analyze the behavior of both new and, if applicable, expert users.

Identifying recognition problems

As you proceed with the Test phase, note any consistent recognition problems.

The most common cause of recognition problems is acoustic confusability among the currently active phrases. For example, both Madison and Addison are US airports. Thus, these potential user inputs to a travel application are highly confusable:

`User:`	`Flying from Madison`
`User:`	`Flying from Addison`

Sometimes there is nothing you can do when this happens. Other times you can try to correct the problem by:

Using a synonym for one of the terms. For example, if the system is confusing “no” and “new,” you might be able to replace “new” with “recent,” depending on the application's context.
Adding a word to one or more of the choices. For the Madison/Addison airport confusion, you could make states optional in the grammar for most cities, but require the state for low-traffic airports that have acoustic confusability with higher-traffic airports.
Plan for disambiguation by writing code that includes or accesses data about typical acoustic confusions. For example:

`System:`	`Flying from?`
`User:`	`Los Angeles <not flagged as confusable>`
`System:`	`Flying to?`
`User:`	`Newark <flagged as confusable with New York>`
`System:`	`Newark, New Jersey or New York, New York?`
`User:`	`Newark, New Jersey`

Identifying any user interface breakdowns

The Test phase is also where you will identify potential user interface breakdowns. Some factors you may want to analyze include:

Percentage of users who did not successfully complete your test scenarios
Percentage of users who transferred to a human operator, when this was not the desired outcome
Points in the application where users experienced the most difficulty
Unexpected user behaviors
Effectiveness of error recovery mechanisms
Time to complete typical transactions
Self-reported level of user satisfaction

The first round of user testing typically reveals places where the system's response needs to be rephrased to improve usability. For this reason, system prompts and other messages should be left flexible for as long as possible, at least until after the first round of user testing.