IBM Cloud Speech To Text (STT) supported parameters

Overview

BVR includes support to configure some aspects of the IBM Cloud STT engine. These paramters are specified in the "stt_parms" key/value pairs.

IBM Cloud STT supported parameters

These are the supported parameters that will be sent to the IBM Cloud STT engine. Specific information on these parameters can be found in the IBM documentation https://cloud.ibm.com/apidocs/speech-to-text#recognize

Note that all of these parameters are optional.

Variable Type Default Description
model string en-US_BroadbandModel Model for processing audio.
customization_weight float 0.3 How much weight to give to customization words.
inactivity_timeout integer 30 Number of seconds of silence before the stopping the STT.
interim_results boolean false Returns results as they are generated.
keywords List   Array of keyword strings to spot in the audio.
keywords_threshold float   Confidence level lower bound to recognize a keyword. By default no keyword spotting.
max_alternatives integer 1 Maximum number of alternatives to return. By default this is a single result.
word_alternatives_threshold float   Confidence level lower bound for identifying alternatives. By default no alternative words are found.
word_confidence boolean true If true this returns the confidence of each word.
timestamps boolean true If true return timestamps for each word.
profanity_filter boolean true If true replaces all inappropriate words with asterisks.
smart_formatting boolean false If true the STT engine will convert various values into a more readable form.
speaker_labels boolean false If true the response will include labels identifying the individual speakers.
grammar_name string   Name of the grammar for the recognition.
redaction boolean false If true all numerical data will be redacted from the result.
processing_metrics boolean false If true processing metrics are returned in the result.
processing_metrics_interval float 1.0 Interval, in seconds, to return processing metrics.
audio_metrics boolean false If true detailed information about signal characteristics of the audio are returned.
end_of_phrase_silence_time float 0.8 Duration of the pause interval for splitting a transcript into multiple results.
split_transcript_at_phrase_end boolean false If true directs the STT engine to split the transcript into multiple final results based on semantic features of the input.
speech_detector_sensitivity float 0.5 Sensitivity of speech activity detection.
background_audio_suppression float 0.0 Level to supress background audio.
low_latency boolean false If true attempts to return results quicker.
character_insertion_bias float 0.0 Bias between shorter or longer strings when generating the results.
skip_zero_len_words boolean false Undocumented feature.