VoiceXML elements and attributes

Table 1. Summary of VoiceXML elements and attributes
Element Description Implementation details
<assign> Assigns a value to a variable. Supported as documented in VoiceXML 2.0.
<audio> Plays an audio file within a prompt or synthesizes speech from a text string if the audio file cannot be found.

Supported as documented in VoiceXML 2.0;

You can specify an audio file using a URI or you can use the expr attribute to play audio recorded using the <record> element.

The supported audio formats are:
  • an 8KHz 8-bit mu-law .au file
  • an 8KHz 8-bit mu-law .wav file
  • an 8KHz 8-bit a-law .au file
  • an 8KHz 8-bit a-law .wav file
  • an 8KHz 16-bit linear .au file
  • an 8KHz 16-bit linear .wav file
Use an 8-bit format whenever possible; 16-bit linear files take twice as much storage and require twice as long to download.
<block> Specifies a form item containing non-interactive executable content. Supported as documented in VoiceXML 2.0.
<break> Inserts a pause in TTS output.

VoiceXML 2.0 inherits this element from SSML.

VoiceXML 2.1 added the strength attribute. The time attribute now specifies the duration of a pause to be inserted in the output in seconds or milliseconds, for example, time="250ms" or time="3s".

<catch> Catches an event. Supported as documented in VoiceXML 2.0.
<choice> Specifies a menu item. Supported as documented in VoiceXML 2.0.
<clear> Clears one or more form item variables. Supported as documented in VoiceXML 2.0.
<data> Allows a VoiceXML application to fetch arbitrary XML data from a document server without transitioning to a new VoiceXML document. Supported as documented in VoiceXML 2.1.
<disconnect> Causes the VoiceXML browser to disconnect from a user telephone session. VoiceXML 2.1 added the namelist attribute to this element. Otherwise, supported as documented in VoiceXML 2.0.
<dtmf> Specifies a DTMF grammar.

This element is not part of VoiceXML 2.0 and is not supported by Blueworx Voice Response.

You can, however, specify a DTMF grammar using the <grammar> element with the mode="dtmf" attribute.

Note: DTMF grammars used with Blueworx Voice Response do not support the use of hotword barge-in unless Blueworx Voice Response is configured so that the DTMF detection is passed to an MRCP server rather than allowing the default behavior for Blueworx Voice Response to detect DTMFs.

Otherwise, if a prompt is played with bargeintype="hotword" and a DTMF grammar is active, the prompt will stop as soon as any DTMF key is detected, and the prompt will be played as though it was set to bargeintype="speech". This occurs regardless of whether or not any speech grammars are also active.

<else> Conditional statement used with the <if> element. Supported as documented in VoiceXML 2.0.
<elseif> Conditional statement used with the <if> element. Supported as documented in VoiceXML 2.0.
<emp> Specifies the emphasis for TTS output.

This element has been replaced by the SSML element <emphasis>.

<enumerate> Shorthand construct that causes the VoiceXML browser to speak the text of each <choice> element when presenting the list of menu selections to the user. When using the <enumerate> element to play menu choices with the TTS engine, you do not have to add punctuation to control the length of the pauses between <choice> elements. The VoiceXML browser will automatically add the appropriate pauses and intonations when speaking the prompts.
<error> Catches an error event. Supported as documented in VoiceXML 2.0.
<exit> Exits a VoiceXML browser session. As described in VoiceXML 2.0, the <exit> element returns control to the interpreter, which determines what action to take. Use expr to return a single value or namelist to return one or more variables. If CCXML created this VoiceXML session, these values will be available on dialog.exit's event$.values .
<field> Defines an input field in a form. Supported as documented in VoiceXML 2.0.
<filled> Specifies an action to execute when a field is filled. Supported as documented in VoiceXML 2.0.
<foreach> Allows a VoiceXML application to iterate through an ECMAScript array and to execute the content contained within the element for each item in the array. Supported as documented in VoiceXML 2.1.
<form> Specifies a dialog for presenting and gathering information. Supported as documented in VoiceXML 2.0. The VoiceXML browser supports mixed initiative dialogs using SISR.
<goto> Specifies a transition to another dialog or document. Supported as documented in VoiceXML 2.0.
<grammar> Defines a speech recognition grammar.

Voice XML 2.1 added the srcexpr attribute to this element.

Supported values for the type attribute are:
Grammar
Type
SRGF ABNF
application/srgs
SRGF XML
application/srgs+xml
<help> Catches a help event. Supported as documented in VoiceXML 2.0.
<if> Defines a conditional statement. Supported as documented in VoiceXML 2.0.
<initial> Prompts for form-wide information in a mixed-initiative form. Supported as documented in VoiceXML 2.0.
<link> Specifies a transition to a new document or throws an event, when activated. Supported as documented in VoiceXML 2.0.
<log> Generates a debug message. Supported as documented in VoiceXML 2.0.
<mark> Places a marker into the text/tag sequence. Supported as documented in VoiceXML 2.1.
<menu> Specifies a dialog for selecting from of a list of choices. Supported as documented in VoiceXML 2.0.
<meta> Specifies meta data about the document. The http-equiv attribute is supported for the values date, expires and lastModified. However, <meta> declarations in Speech Recognition Grammars are ignored.

The VoiceXML browser ignores the name attribute and any content specified with it. You can use these attributes to identify and assign values to the properties of a document, as defined by the HTML 4.0 specification (http://www.w3.org/TR/REC-html40/) and the HTTP 1.1 specification (http://www.ietf.org/rfc/rfc2616.txt).

<metadata> Defines metadata information using a metadata schema This is new for VoiceXML 2.0 but it is ignored by the VoiceXML browser if it is CDATA; otherwise an error is generated.
<noinput> Catches a noinput event. Supported as documented in VoiceXML 2.0.
<nomatch> Catches a nomatch event. Supported as documented in VoiceXML 2.0.
<object> Specifies platform-specific objects. For Blueworx Voice Response, some attributes are supported in specific ways:
classid
The following URI is supported: method://<fully qualified java classname>/<java method name>
codetype
The value javacode is supported for the codetype attribute.
archive
The archives can be arbitrary Jar files containing Java classes referenced by the classid attribute. The archives must be specified in a fully qualified URL format. If archives are not supplied, the system class loader (local class path) is used to resolve classid references.

Fetching and caching attributes are not supported.

All Ecmascript integers are passed to Java objects as java.lang.Double. If the VoiceXML application is to pass an integer value to a Java object, this must be done through a method which accepts that value as a java.lang.Double parameter object.

<option> Specifies a field option. Supported as documented in VoiceXML 2.0.
<param> Specifies a parameter in an object or subdialog. Supported as documented in VoiceXML 2.0.
<prompt> Plays TTS or recorded audio output. Supported as documented in VoiceXML 2.0.
<property> Controls aspects of the behavior of the implementation platform.
The attributes:
  • objectfetchhint
  • objectmaxage
  • objectmaxstale
are ignored by the VoiceXML browser.
The names:
  • sensitivity
  • speedvsaccuracy
are supported in this release.
The names:
  • com.ibm.speech.asr.saveaudio
  • com.ibm.speech.asr.saveaudiotype
  • com.ibm.speech.asr.endpointed
  • audiouriredacted - Refer to Logging and Tracing for usage
are supported in addition to those documented in VoiceXML 2.0.

In addition to the names listed above, VoiceXML 2.1 adds the recordutterance and recordutterancetype names to this element. If the value for recordutterance is set to "true" then three shadow variables may be used. For more details, see the VoiceXML 2.1 specification. Use the recordutterancetype property name to specify media formats for utterance recordings. For more details, see the VoiceXML 2.1 specification.

The confidencelevel property specifies a threshold for determining whether recognition results, or scores, should be accepted by the VoiceXML application. When scores are above the confidence-level threshold, the VoiceXML browser considers the recognized words acceptable; the appropriate handlers are called, and the array of the shadow variable application.lastresult$ is filled with the scores up to the maxnbest value. For a <field>, for example, the <filled> handler would be called. If no scores are above the confidence level, appropriate <nomatch> handlers are executed.

maxnbest returns a maximum of 100 n-best results to the application.

Some vendor-specific VoiceXML properties are also supported. Properties that match a given pattern can be passed through from a VoiceXML document and then sent to a speech server in an MRCP SET-PARAMS message. Currently Nuance speech recognition property names that begin with swiep or swirec are supported.

<pros> Controls the prosody of TTS output.

This element has been replaced by the SSML element <prosody>.

<record> Records spoken user input.

Any grammars active during <record> are ignored.

The timeout attribute specified for a prompt is ignored during <record>; no noinput event is generated.

The type attribute may take the following values:
audio/basic
Creates a .au file of 8kHz, 8-bit µ-law encoding
audio/x-alaw-basic
Creates a .au file of 8kHz, 8-bit a-law encoding
audio/x-wav
Creates a Microsoft wav file of 8kHz, 16-bit, linear PCM encoding

The beep attribute defaults to false (a beep is not played). The maxtime attribute defaults to 60 seconds.

Only the duration, size and termchar shadow variables have been implemented for <record>. Size contains the internal size of the raw audio data (8-bit headerless, µ-law or a-law encoding).

<reprompt> Causes the form interpretation algorithm to queue and play a prompt when entering a form after an event. Supported as documented in VoiceXML 2.0.
<return> Returns from a subdialog. Supported as documented in VoiceXML 2.0.
<sayas> Controls pronunciation of words or phrases in TTS output.

This element has been replaced by the SSML element <say-as>. For more information.

<script> Specifies ECMAScript code.

VoiceXML 2.1 adds the srcexpr attribute to this element. Otherwise, supported as documented in VoiceXML 2.0.

<subdialog> Invokes a new dialog as a subdialog of the current one, in a new execution context. Supported as documented in VoiceXML 2.0.
<submit> Submits a list of variables to the document server. Supported as documented in VoiceXML 2.0.

To submit the results of a <record> element, you must use enctype=“multipart/form-data”.

<throw> Throws an event. Supported as documented in VoiceXML 2.0.
<transfer> Connects the telephone caller to a third party.

VoiceXML 2.1 adds the type attribute to this element.

You can use the <transfer> element to transfer VoiP/SIP telephone calls, for example:

<transfer
 name="blindTransfer"
 dest="sip:444@9.20.49.62:5067"
 type="blind" connecttimeout="3000ms">
The following attributes are not supported in this release:
  • aaiexpr
  • connecttimeout
  • maxtime
  • transferaudio

The attribute aai is ignored by the VoiceXML browser.

The attribute setting bridge=“true” is only supported if the VoiceXML dialog has been invoked from a CCXML application and the CCXML application has been coded to handle the dialog.transfer event with type="bridge".

The attribute setting type=“bridge” is only supported if the VoiceXML dialog has been invoked from a CCXML application and the CCXML application has been coded to handle the dialog.transfer event with type="bridge".

<value> Embeds a variable in a prompt. Supported as documented in VoiceXML 2.0.
<var> Declares a variable. Refer to VoiceXML declaring variables for information about variable scope .
<vxml> Top-level container for all other VoiceXML elements in a document.

All VoiceXML documents must specify either version=“2.0” or version=“2.1”, and xmlns=“http://www.w3.org/2001/vxml”; if these attributes are missing, the VoiceXML browser will throw an error.badfetch event.

The xml:lang attribute defaults to the language specified by the locale of the JVM in which the VoiceXML browser runs.

Check for language specific limitations and considerations in the language appendixes.