Integration with Virtual Assistants

Overview

BVR includes functionality to allow a VoiceXML application to connect to Virtual Assistants as an alternative to traditional MRCP ASR and TTS speech providers. This is achieved using standard VoiceXML form processing and grammar tags with a custom grammar format.

In order to use a Virtual Assistant in a VoiceXML application, the application must have a Virtual Assistant Call Feature attached to it which matches the locale of the application. If the VoiceXML application is using the automatic STT step then a STT Call Feature also much be attached to the application. For further information on Call Features, please refer to BAM Command Line Utility Call Features Panel

A Virtual Assistant interaction is triggered by defining a grammar tag in a field with a mode of "voice" and a type of "application/x-blueworx-virtual-assistant+json". This is a custom Blueworx grammar format specifically for interacting with Virtual Assistants and is supplied as a JSON string, either inline or using the srcexpr parameter of a grammar.

Responses from the Virtual Assistant are returned to the VoiceXML application when the field is filled using standard VoiceXML form item variables.

The application can directly invoke the STT step using "application/x-blueworx-stt+json" grammar as documented Using Speech To Text (STT) engines. The application can modify the STT result, if required, then fill out the transcript parameter.

The contents and format of the custom grammar and the format of the response data is described in the following sections and examples are given below.

Data Structures

Format of data structures passed to and from Virtual Assistants

An object structure is used to send data to and receive data from Virtual Assistants. Data sent to the Virtual Assistant is sent in the grammar, and returned via the interpretation parameter of the shadow variables. This normalised data structure is sent out to the attached Virtual Assistant in the format it is expecting, and the Virtual Assistant's response is normalised to the Blueworx format when it is returned in the filled block.

Note that all of these parameters are optional on an outbound message

Variable	Direction	Type	Description
transcript	Outbound	String	The text to send to the Virtual Assistant as if it were spoken. If this variable exists and is not empty, the Virtual Assistant will skip the STT step and treat the transcript as if it were spoken by the caller. This can be used to seperate the STT step from the Virtual Assistant step to allow the application to modify the STT result. Note that if the request structure object is setup in a way that it can be reused between turns, it is important that you clear the transcript variable or it will loop continually sending the same transcript to the virtual assistant.
message_text	Inbound	String	A single string representing the text that the virtual assistant wants to say to the user as a response for their input. This is an easy way to access the data without having to parse through nested structures in the messages array.
messages	Inbound	Array of Message	This is an array of messages that have been returned by the Virtual Assistant as a response to what the used said. An example of its usage would be to render this as TTS so that the user can have a conversation with the Virtual Assistant using the Virtual Assistant's messages.
intents	Both	Array of Intent	Outbound: Sends a list of intents to the Virtual Assistant which will set the intent of the current request rather than having the Virtual Assistant itself determine the intent from the user input. Inbound: A list of intents that matched the user input. If no intents were matched, this list will be empty.
intent	Both	Intent	Outbound: Same as "intents" but only sets a single intent Inbound: The intent from the response with the highest confidence value
entities	Both	Array of Entity	Outbound: Sends a list of entities to the Virtual Assistant Inbound: A list of entities set by the Virtual Assistant this turn.
entity	Both	Entity	Outbound: Same as "entities" but only sets a single entity Inbound: The entity from the response with the highest confidence value
custom_vars	Both	Object (key/value pair map)	Outbound: Sets custom variables to send to the Virtual Assistant Inbound: Gets any custom variables from the Virtual Assistant's response
context	Both	Object (key/value pair map)	This is a freeform Object containing key/value pairs and may mean different things to different Virtual Assistants. Outbound: Sets contexts Inbound: The context returned by the Virtual Assistant
options	Outbound	Object (key/value pair map)	Sets options on the Virtual Assistant. Options are platform specific and may vary between Virtual Assistants.
raw_json	Inbound	String	The raw JSON string returned by the Virtual Assistant service

Type definitions

Message

Field	Type	Description
text	String	A message that the Virtual Assistant has sent back as a conversational response to the user's input

Intent

Field	Type	Description
intent	String	The name of the intent
confidence	float	The confidence value of the intent (between 0.0 and 1.0)

Entity

Field	Type	Description
entity	String	The name of the entity
value	String	The value of the entity
confidence	float	The confidence value of the entity (between 0.0 and 1.0)

Using Inline Grammars

To use inline grammars, supply the Virtual Assistant data structure format as JSON in the grammar body.

Example:

This sends a request to the virtual assistant with the custom variables "name" and "type"


    <grammar mode="voice" version="1.0" type="application/x-blueworx-virtual-assistant+json">
        {
            "custom_vars": {
                "name": "Bob",
                "item": "Burger"
            }
        }
    </grammar>

Note that using an inline grammar is the simplest way to send a request to a Virtual Assistant without any parameters. The body of the grammar can be an empty JSON structure:


    <grammar mode="voice" version="1.0" type="application/x-blueworx-virtual-assistant+json">
        {
        }
    </grammar>

Using srcexpr

The data structure for Virtual Assistants can be set to the srcexpr parameter of the grammar tag. The value of the srcexpr must be a JSON representation of the Virtual Assitant data structure. To make the JSON easier to generate, BVR includes a function to convert an Object structure to JSON. Therefore, it is possible to build up the Virtual Assistant data structure as an Object in the VXML document then pass it to this function to get the JSON representation. The function is called "objectToJson".

Example usage:

1. Define the variable as a new Object


    <var name="virtual_assistant_request_structure" expr="new Object()"/>

2. Add any variables for the request. For example, the following sets up 2 custom variables variables - "name" and "item"


    <assign name="virtual_assistant_request_structure.custom_vars" expr="new Object()"/>
    <assign name="virtual_assistant_request_structure.custom_vars.name" expr="'Bob'"/>
    <assign name="virtual_assistant_request_structure.custom_vars.item" expr="'Burger'"/>

3. This structure can be converted to JSON and assigned to the srcexpr parameter of the grammar tag


    <grammar mode="voice" version="1.0" type="application/x-blueworx-virtual-assistant+json" srcexpr="objectToJson(virtual_assistant_request_structure)"/>

Handling data returned from a Virtual Assistant

If no input was detected, a "noinput" event will be thrown. If no intents or entities were detected in the response, a "nomatch" event will be thrown, however it is still possible to pull the Virtual Assistant's response using the application.lastresult$ shadow variable.

The shadow variables in the filled block will contain the usual fields.

Field	Description
utterance	This is what the user said to the virtual assistant.
confidence	The highest confidence value returned for all intents and entities in the response
interpretation	This is the object that the Virtual Assistant data structure is mapped to and contains the entirety of the normalised data structure in the Object key/value pair format

The confidencelevel VXML property

The confidencelevel VXML property is used to determine the confidence threshold below which the result will be deemed a no match. In the case of Virtual Assistants, this is applied in 2 stages. Firstly, if the Virtual Assistant uses an STT engine, the confidence result from the STT will be checked. If this is below the confidence threshold, a nomatch will be returned. If it is not, the output of the STT will be sent to the Virtual Assistant. The confidence returned from the Virtual Assistant will then be checked and, again, if it is below the threshold, it will generate a nomatch.

Setting custom STT parameters

It is possible to set parameters for an STT engine by using the stt_parms parameter of the options structure. As an example, if using the IBM Cloud STT engine, all parameters available in the WebSockets API can be set in stt_parms and will be sent to the STT engine. Note that URL parameters and message body parameters can both be used in the stt_parms structure for IBM STT. If the model parameter is set in stt_parms it will override the model in the call feature definition. Below is an example demonstrating how to set the end_of_phrase_silence_time parameter to 0.5 seconds.


    <!-- Construct the options structure if it doesn't already exist -->
    <assign name="virtual_assistant_request_structure.options" expr="new Object()"/>

    <!-- Construct the stt_parms structure -->
    <assign name="virtual_assistant_request_structure.options.stt_parms" expr="new Object()"/>

    <!-- Set the end_of_phrase_silence_time STT parameter to 0.5 -->
    <assign name="virtual_assistant_request_structure.options.stt_parms.end_of_phrase_silence_time" expr="'0.5'"/>

The supported Google STT engine parameters are documented at Google Speech To Text (STT) supported parameters

The supported IBM Cloud STT engine parameters are documented at IBM Cloud Speech To Text (STT) supported parameters