This section describes how to use multiple result grammars in Blueworx Voice Response VoiceXML applications.
Advanced Natural Language solutions can be developed using speech technologies (such as Nuance Recognizer 9) that can provide a list of likely speech recognition result information where a speech utterance containing multiple recognition contexts has been provided. For example, in response to a travel timetable request a caller responded “Austin Minnesota on Friday at 9pm” as a single statement. VoiceXML can hold the pertinent information in the statement as “Austin”, “Minnesota”, “Friday”, “9pm”, and the information can then be passed to the application using VoiceXML session variables. This allows the VoiceXML application developer to modify any following questions to the caller based on the previous response. For example, if the confidence of the recognition result for the destination was low, but the confidence for the other parameters was high, the system could then prompt the caller for the destination only. It also allows multiple interpretations of a speech utterance containing more than one recognition context to be handled. Confidence values for individual parts of a result are also retained and can be made available to a VoiceXML application.
Blueworx Voice Response supports the use of the following VoiceXML context variables and shadow variables to handle slot confidence scores.
Variable | Description |
---|---|
answer$.interpretation | The variable holding the speech recognition result of the VoiceXML <field name="answer">. It contains slot values and confidence values and an array of interpretations . |
answer$.confidence | The variable holding the confidence values of the speech recognition result of a VoiceXML <field name="answer">. |
answer$.utterence | The variable holding the utterance values of the speech recognition result of a VoiceXML <field name="answer">. |
answer$.inputmode | The variable holding the input mode value of the speech recognition result of a VoiceXML <field name="answer">. |
lastresult$.interpretation | The variable holding the last speech recognition result of the VoiceXML <field>. It can have multiple entries (if the maxnbest property is set) and contains both slot values and confidence values and an array of interpretations. |
lastresult$.confidence | The variable holding the confidence values of the last speech recognition result. |
lastresult$.utterence | The variable holding the utterance values of the last speech recognition result. |
lastresult$.inputmode | The variable holding the input mode values of the last speech recognition result. |
WebSphere Voice Server does not support the use of multiple result grammars with Blueworx Voice Response.
The following VoiceXML document illustrates how a simple travel timetable or ticketing system could make use of multiple recognition contexts. In this case the caller is asked for a destination. Some of the destinations available also happen to sound similar, so the ability to process separate confidence scores for the name of the city and the name of the state makes it easier for the system to differentiate between the possible results.
<?xml version="1.0"?> <!DOCTYPE vxml PUBLIC "-//W3C//DTD VOICEXML 2.0//EN" "vxml20-1115.dtd"> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <meta http-equiv="Cache-Control" content="no-cache"/> <property name="confidencelevel" value="0.2"/> <property name="maxnbest" value="20"/> <script> function print_all(toPrint) { var output = ""; for (var i in toPrint) { if (toPrint[i].toString() == "[object Object]") { output += i + ' = {' + print_all(toPrint[i]) + '}\n'; } else {
output += i + ' = ' + toPrint[i] + '\n'; } } return output; } </script> <form id="start"> <field name="answer"> <prompt>Say the name of a city.</prompt> <grammar type="application/srgs+xml" version="1.0" root="_root"> <rule id="_root" scope="public"> <one-of> <item> Austin<tag>city='Austin';state='TX'</tag> </item> <item> Austin<tag>city='Austin';state='MN'</tag> </item> <item> Boston<tag>city='Boston';state='MA'</tag> </item> <item> Washington<tag>city='Washington';state='DC'</tag> </item> </one-of> </rule> </grammar> <filled> <log> Answer: <value expr="answer"/> , utterance : <value expr="answer.utterance"/> </log> <log> Answer expanded: <value expr="print_all(answer)"/> </log> <log> Answer$: <value expr="answer$"/> , utterance : <value expr="answer$.utterance"/>, confidence = <value expr="answer$.confidence"/>, inputmode = <value expr="answer$.inputmode"/> , interpretation = <value expr="answer$.interpretation"/> </log> <log> Answer$.interpretation expanded: <value expr="print_all(answer$.interpretation)"/> </log> <log> Last result: <value expr="application.lastresult$"/></log> <log> Length: <value expr="application.lastresult$.length"/> </log> <log> lastresult$.utterance = <value expr="application.lastresult$[0].utterance"/> or <value expr="application.lastresult$.utterance"/> </log> <log> lastresult$.confidence = <value expr="application.lastresult$[0].confidence"/> or <value expr="application.lastresult$.confidence"/> </log> <log> lastresult$.inputmode = <value expr="application.lastresult$[0].inputmode"/> or <value expr="application.lastresult$.inputmode"/> </log> <log> lastresult$.interpretation = <value expr="application.lastresult$.interpretation"/></log> <log> lastresult$.interpretation expanded = <value expr="print_all(application.lastresult$.interpretation)"/></log> <log> Interpretation Length: <value expr="application.lastresult$[0].interpretation.length"/> </log> <log> Interpretation 0 : <value expr="print_all(application.lastresult$[0].interpretation[0])"/> </log> <log> Interpretation 0: city.value <value expr="application.lastresult$[0].interpretation[0].city.value"/></log> <log> Interpretation 0: city.confidence <value expr="application.lastresult$[0].interpretation[0].city.confidence"/></log> <log> Interpretation 0: state.value <value expr="application.lastresult$[0].interpretation[0].state.value"/></log> <log> Interpretation 0: state.confidence <value expr="application.lastresult$[0].interpretation[0].state.confidence"/></log> <if cond="application.lastresult$[0].interpretation.length > 1"> <log> Interpretation 1: city.value <value expr="application.lastresult$[0].interpretation[1].city.value"/></log> <log> Interpretation 1: city.confidence <value expr="application.lastresult$[0].interpretation[1].city.confidence"/></log> <log> Interpretation 1: state.value <value expr="application.lastresult$[0].interpretation[1].state.value"/></log> <log> Interpretation 1: state.confidence <value expr="application.lastresult$[0].interpretation[1].state.confidence"/></log> <else/> <log> There are no more interpretations to list </log> </if>
<if cond="application.lastresult$.length > 1"> <log> ANOTHER RESULT! </log> <log> lastresult$.utterance = <value expr="application.lastresult$[1].utterance"/> </log> <log> lastresult$.confidence = <value expr="application.lastresult$[1].confidence"/> </log> <log> lastresult$.inputmode = <value expr="application.lastresult$[1].inputmode"/> </log> <log> lastresult$.interpretation = <value expr="application.lastresult$[1].interpretation"/></log> <log> lastresult$.interpretation expanded = <value expr="print_all(application.lastresult$[1].interpretation)"/></log> <if cond="application.lastresult$.length > 2"> <log> ANOTHER RESULT! </log> <log> lastresult$.utterance = <value expr="application.lastresult$[2].utterance"/> </log> <log> lastresult$.confidence = <value expr="application.lastresult$[2].confidence"/> </log> <log> lastresult$.inputmode = <value expr="application.lastresult$[2].inputmode"/> </log> <log> lastresult$.interpretation = <value expr="application.lastresult$[2].interpretation"/></log> <log> lastresult$.interpretation expanded = <value expr="print_all(application.lastresult$[2].interpretation)"/></log> <else/> <log> There are no more results to list </log> </if> <else/> <log>There are no more results to list </log> </if> <goto next="#end_form"/> </filled> </field> </form> <form id="end_form"> <block> <prompt>Goodbye</prompt> </block> </form> </vxml>
The unprocessed, ‘raw’ NLSML can also be returned from the Speech Recognition server by Blueworx Voice Response.
It is accessible using the following VoiceXML shadow variables:
Variable | Description |
---|---|
application.lastresult$.nlsml | The variable holding a string that includes the latest speech recognition result information. |
field_name$.nlsml | The variable holding a string that includes the previous speech recognition result information for the field_name input element. |
Both variables hold similar information, for example (for Nuance speech server):
<?xml version='1.0'?> <result> <interpretation grammar="session:grammar6@grammar. store" confidence="98"> <input mode="speech">America</input> <instance> <name confidence="98">America</name> <SWI_literal>America</SWI_literal> <SWI_grammarName>session:grammar6@grammar.store</SWI_gramm arName> <SWI_meaning>{name:America}</SWI_meaning> </instance> </interpretation> </result>
<?xml version='1.0'?> <result> <interpretation grammar="session:grammar4@grammar.store" confidence="100"> <input mode="dtmf">1 2 3 4</input> <instance> <SWI_literal>1 2 3 4</SWI_literal> <SWI_grammarName>session:grammar4@grammar.store</SWI_grammarName> <SWI_meaning>{SWI_literal:1 2 3 4}</SWI_meaning> </instance> </interpretation> </result>When a .nlsml variable is accessed that is not populated, as in the case of internal DTMF detection, it returns "undefined".