SSML elements and attributes

Table 1. Summary of SSML elements and attributes
Element Description Implementation details
<audio> Plays an audio file within a prompt or synthesizes speech from a text string if the audio file cannot be found. Supported as documented in VoiceXML 2.0, except that the <audio> element cannot be contained within other SSML elements, such as in:<emphasis level=“strong”> <audio .../> </emphasis>
<break> Inserts a pause in TTS output. Supported as documented in VoiceXML 2.1.
<desc> Describes the content of an audio source. Supported as documented in VoiceXML 2.0.
<emphasis> Specifies the emphasis for TTS output. Supported as documented in VoiceXML 2.0.
<lexicon> References external pronunciation definitions. This element is not supported in Blueworx Voice Response.
<mark> Inserts a reference point in a document. This element is not supported in VoiceXML 2.0, but is supported in VoiceXML 2.1 as documented.
<metadata> Specifies general information about the document. This element is ignored by the VoiceXML browser.
<p> Specifies text structure in the absence of an end of sentence punctuation character. Supported as documented in VoiceXML 2.1.
<paragraph> Specifies text structure in the absence of an end of sentence punctuation character. Supported as documented in VoiceXML 2.0.
<phoneme> Specifies the pronunciation and phonology for a TTS output. Supported as documented in VoiceXML 2.0. WebSphere Voice Server supports the IPA and IBM alphabets.

The IBM alphabet used in SSML refers to the phonology used by IBM TTS. The following example shows the US English phonetic pronunciation of “tomato” using the IBM TTS phonetic alphabet:

<phoneme alphabet="ibm" ph=".0tx.1me.0Fo"> tomato </phoneme>

For more information on IBM SPRs, see the IBM Text-To-Speech SSML Programming Guide.

<prosody> Controls the pitch, range, rate and volume of TTS output. The contour and duration attributes are not supported.
<say-as> Specifies the type of text. For example, date, telephone number or currency. The following attribute values are not supported:
  • interpret-as=“letters” with details=“strict”
  • details=“dictate”
  • interpret-as=“words”
Note: The details=“punctuation” attribute value is not supported in Simplified Chinese.
<say-as> can be used with built-in grammar types.
<s> Specifies text structure in the absence of an end of sentence punctuation character. Supported as documented in VoiceXML 2.1.
<sentence> Specifies text structure in the absence of an end of sentence punctuation character. Supported as documented in VoiceXML 2.0.
<sub> Specifies substitute text for TTS output in place of a given input string. Supported as documented in VoiceXML 2.0.
<voice> Specifies the speaking voice used for TTS ouput. Supported as documented in VoiceXML 2.0.

Language specific limitations and considerations are given in the language appendixes.