Text to Speech

Some information, for example, news items, stock prices, or electronic mail is difficult or impractical to prerecord before it is made available over a telephony system. Instead, your application can have text converted into synthesized speech as it is needed.

Text-to-speech provides the necessary flexibility for applications that have a large number or varying spoken responses. information can be read to the caller without the need to prerecord voice segments. Applications can use synthesized speech, prerecorded speech, or a mixture of both.

Blueworx Voice Response uses the open standard communication protocol MRCP Version 1 to communicate with speech technologies such as WebSphere Voice Server 5.1, Nuance Speech Server version 5.1.2 (Recognizer 9.0.13, Vocalizer 5.0.2), and Loquendo Speech Server V7.

Text-to-speech using WebSphere Voice Server

When using WebSphere Voice Server or third party speech technology vendor's product compatible with MRCP V1, the application uses the Blueworx Voice Response MRCP client interface to send text to a text-to-speech server. The text-to-speech server returns the synthesized speech back through the Blueworx Voice Response MRCP client interface and the application then sends it down the voice channel to the caller.

Speech synthesis technology is very dependent on the language that is being synthesized. The WebSphere Voice Server Text-To-Speech function is available in a number of languages.

The WebSphere Voice Server Text-To-Speech function allows your voice applications to adapt to specific circumstances by synthesizing a prompt from a text string and playing it dynamically in real time. For example, in response to a caller's request for which no prerecorded prompt is available, a voice application can select an appropriate text string and convert it to speech to play back to the caller. This enables your application developers to develop voice applications more quickly and cheaply. The Text-To-Speech function also supports barge-in (or cut through) so that synthesized speech can be interrupted by a caller in the same way that a prerecorded prompt can. See Speech Recognition for more information on using text-to-speech together with speech recognition in a voice application and also on the different ways in which text-to-speech engines can be allocated or assigned.

Text-to-speech servers can be on separate systems that are shared between multiple Blueworx Voice Response systems, allowing more callers to be handled at the same time. This type of arrangement is scalable, and also provides redundancy, so that if the need arises to close down part of the system, the remaining machines can continue to provide a service.

For detailed information about the text-to-speech functions provided in WebSphere Voice Server, refer to the WebSphere Voice Server infocenter at:

http://publib.boulder.ibm.com/infocenter/pvcvoice/51x/index.jsp