Recording user input during speech recognition

You can use the <property> element to capture spoken user input for speech recognition as an audio file as well as a text string. You can use the <property> extensions shown in Table 1 to set audio recording at the application, document, dialog, form or menu level. Additionally, VoiceXML 2.1 includes the properties recordutterance and recordutterancetype which perform some of the same functions as the settings listed in the table. For more information on these new settings for the <property> element, see the VoiceXML 2.1 specification.

Table 1. Properties for capturing audio during speech recognition
Name	Description	Implementation
com.ibm.speech.asr.saveaudio	Specifies whether or not to save spoken input as an audio file.	Used with value=“true” (to capture the audio) or value=“false” .
com.ibm.speech.asr.saveaudiotype	Specifies the format in which to save the audio file.	Used with value=“type”. Where type is a media type as documented in appendix E of the VoiceXML 2.1 specification.
com.ibm.speech.asr.endpointed	Specifies whether or not to remove leading and trailing silence from the audio file.	Used with value=“true” (to remove silence) or value=“false” .

The audio file is captured in the shadow variable x$.asraudio and stored in the application variable application.lastresult$[i].asraudio.

End-pointed audio recording with MRCP V1.0 speech engines

With some MRCP V1.0 speech engines such as Nuance Speech Server version 5.1.2 (Recognizer 9.0.13), it is also possible to specify end-pointed audio recording of utterances in which any leading or trailing audio such as silence or background noise around the speech utterance is trimmed by the speech engine. This is done by setting the appropriate VoiceXML property or properties on a VoiceXML document. For example, for Nuance speech servers, use <property name="swirec.exposeplayable" value="true"/>. Such vendor-specific properties are passed through from a VoiceXML document by Blueworx Voice Response for AIX and sent to a speech server in an MRCP SET-PARAMS message.

With MRCP V1.0 speech engines, the audio file is stored in the application variable application.lastresult$[i].recording. However, when the swirec.exposeplayable property is set to true, the only recording saved is the end-pointed audio and the original (non-end-pointed) audio is no longer available.