BVR includes functionality to allow a VoiceXML application to connect to Virtual Assistants as an alternative to traditional MRCP ASR and TTS speech providers. This is achieved using standard VoiceXML form processing and grammar tags with a custom grammar format.
In order to use a Virtual Assistant in a VoiceXML application, the application must have a Virtual Assistant Call Feature attached to it which matches the locale of the application. If the VoiceXML application is using the automatic STT step then a STT Call Feature also much be attached to the application. For further information on Call Features, please refer to BAM Command Line Utility Call Features Panel
A Virtual Assistant interaction is triggered by defining a grammar tag in a field with a mode of "voice" and a type of "application/x-blueworx-virtual-assistant+json". This is a custom Blueworx grammar format specifically for interacting with Virtual Assistants and is supplied as a JSON string, either inline or using the srcexpr parameter of a grammar.
Responses from the Virtual Assistant are returned to the VoiceXML application when the field is filled using standard VoiceXML form item variables.
The application can directly invoke the STT step using "application/x-blueworx-stt+json" grammar as documented Using Speech To Text (STT) engines. The application can modify the STT result, if required, then fill out the transcript parameter.
The contents and format of the custom grammar and the format of the response data is described in the following sections and examples are given below.
An object structure is used to send data to and receive data from Virtual Assistants. Data sent to the Virtual Assistant is sent in the grammar, and returned via the interpretation parameter of the shadow variables. This normalised data structure is sent out to the attached Virtual Assistant in the format it is expecting, and the Virtual Assistant's response is normalised to the Blueworx format when it is returned in the filled block.
Note that all of these parameters are optional on an outbound message
Variable | Direction | Type | Description |
---|---|---|---|
transcript | Outbound | String | The text to send to the Virtual Assistant as if it were spoken. If this variable exists and is not empty, the Virtual Assistant will skip the STT step and treat the transcript as if it were spoken by the caller. This can be used to seperate the STT step from the Virtual Assistant step to allow the application to modify the STT result. Note that if the request structure object is setup in a way that it can be reused between turns, it is important that you clear the transcript variable or it will loop continually sending the same transcript to the virtual assistant. |
message_text | Inbound | String | A single string representing the text that the virtual assistant wants to say to the user as a response for their input. This is an easy way to access the data without having to parse through nested structures in the messages array. |
messages | Inbound | Array of Message | This is an array of messages that have been returned by the Virtual Assistant as a response to what the used said. An example of its usage would be to render this as TTS so that the user can have a conversation with the Virtual Assistant using the Virtual Assistant's messages. |
intents | Both | Array of Intent |
Outbound: Sends a list of intents to the Virtual Assistant which will set the intent of the current request rather than having the Virtual Assistant itself determine the intent from the user input. Inbound: A list of intents that matched the user input. If no intents were matched, this list will be empty. |
intent | Both | Intent |
Outbound: Same as "intents" but only sets a single intent Inbound: The intent from the response with the highest confidence value |
entities | Both | Array of Entity |
Outbound: Sends a list of entities to the Virtual Assistant Inbound: A list of entities set by the Virtual Assistant this turn. |
entity | Both | Entity |
Outbound: Same as "entities" but only sets a single entity Inbound: The entity from the response with the highest confidence value |
custom_vars | Both | Object (key/value pair map) |
Outbound: Sets custom variables to send to the Virtual Assistant Inbound: Gets any custom variables from the Virtual Assistant's response |
context | Both | Object (key/value pair map) |
This is a freeform Object containing key/value pairs and may mean different things to different Virtual Assistants. Outbound: Sets contexts Inbound: The context returned by the Virtual Assistant |
options | Outbound | Object (key/value pair map) | Sets options on the Virtual Assistant. Options are platform specific and may vary between Virtual Assistants. |
raw_json | Inbound | String | The raw JSON string returned by the Virtual Assistant service |
Field | Type | Description |
---|---|---|
text | String | A message that the Virtual Assistant has sent back as a conversational response to the user's input |
Field | Type | Description |
---|---|---|
intent | String | The name of the intent |
confidence | float | The confidence value of the intent (between 0.0 and 1.0) |
Field | Type | Description |
---|---|---|
entity | String | The name of the entity |
value | String | The value of the entity |
confidence | float | The confidence value of the entity (between 0.0 and 1.0) |
To use inline grammars, supply the Virtual Assistant data structure format as JSON in the grammar body.
Example:
This sends a request to the virtual assistant with the custom variables "name" and "type"
<grammar mode="voice" version="1.0" type="application/x-blueworx-virtual-assistant+json"> { "custom_vars": { "name": "Bob", "item": "Burger" } } </grammar>
Note that using an inline grammar is the simplest way to send a request to a Virtual Assistant without any parameters. The body of the grammar can be an empty JSON structure:
<grammar mode="voice" version="1.0" type="application/x-blueworx-virtual-assistant+json"> { } </grammar>
The data structure for Virtual Assistants can be set to the srcexpr parameter of the grammar tag. The value of the srcexpr must be a JSON representation of the Virtual Assitant data structure. To make the JSON easier to generate, BVR includes a function to convert an Object structure to JSON. Therefore, it is possible to build up the Virtual Assistant data structure as an Object in the VXML document then pass it to this function to get the JSON representation. The function is called "objectToJson".
Example usage:
1. Define the variable as a new Object
<var name="virtual_assistant_request_structure" expr="new Object()"/>
2. Add any variables for the request. For example, the following sets up 2 custom variables variables - "name" and "item"
<assign name="virtual_assistant_request_structure.custom_vars" expr="new Object()"/> <assign name="virtual_assistant_request_structure.custom_vars.name" expr="'Bob'"/> <assign name="virtual_assistant_request_structure.custom_vars.item" expr="'Burger'"/>
3. This structure can be converted to JSON and assigned to the srcexpr parameter of the grammar tag
<grammar mode="voice" version="1.0" type="application/x-blueworx-virtual-assistant+json" srcexpr="objectToJson(virtual_assistant_request_structure)"/>
If no input was detected, a "noinput" event will be thrown. If no intents or entities were detected in the response, a "nomatch" event will be thrown, however it is still possible to pull the Virtual Assistant's response using the application.lastresult$ shadow variable.
The shadow variables in the filled block will contain the usual fields.
Field | Description |
---|---|
utterance | This is what the user said to the virtual assistant. |
confidence | The highest confidence value returned for all intents and entities in the response |
interpretation | This is the object that the Virtual Assistant data structure is mapped to and contains the entirety of the normalised data structure in the Object key/value pair format |
The confidencelevel VXML property is used to determine the confidence threshold below which the result will be deemed a no match. In the case of Virtual Assistants, this is applied in 2 stages. Firstly, if the Virtual Assistant uses an STT engine, the confidence result from the STT will be checked. If this is below the confidence threshold, a nomatch will be returned. If it is not, the output of the STT will be sent to the Virtual Assistant. The confidence returned from the Virtual Assistant will then be checked and, again, if it is below the threshold, it will generate a nomatch.
It is possible to set parameters for an STT engine by using the stt_parms parameter of the options structure. As an example, if using the IBM Cloud STT engine, all parameters available in the WebSockets API can be set in stt_parms and will be sent to the STT engine. Note that URL parameters and message body parameters can both be used in the stt_parms structure for IBM STT. If the model parameter is set in stt_parms it will override the model in the call feature definition. Below is an example demonstrating how to set the end_of_phrase_silence_time parameter to 0.5 seconds.
<!-- Construct the options structure if it doesn't already exist --> <assign name="virtual_assistant_request_structure.options" expr="new Object()"/> <!-- Construct the stt_parms structure --> <assign name="virtual_assistant_request_structure.options.stt_parms" expr="new Object()"/> <!-- Set the end_of_phrase_silence_time STT parameter to 0.5 --> <assign name="virtual_assistant_request_structure.options.stt_parms.end_of_phrase_silence_time" expr="'0.5'"/>
The supported Google STT engine parameters are documented at Google Speech To Text (STT) supported parameters
The supported IBM Cloud STT engine parameters are documented at IBM Cloud Speech To Text (STT) supported parameters