BVR includes support to configure some aspects of the IBM Cloud STT engine. These paramters are specified in the "stt_parms" key/value pairs.
These are the supported parameters that will be sent to the IBM Cloud STT engine. Specific information on these parameters can be found in the IBM documentation https://cloud.ibm.com/apidocs/speech-to-text#recognize
Note that all of these parameters are optional.
Variable | Type | Default | Description |
---|---|---|---|
model | string | en-US_BroadbandModel | Model for processing audio. |
customization_weight | float | 0.3 | How much weight to give to customization words. |
inactivity_timeout | integer | 30 | Number of seconds of silence before the stopping the STT. |
interim_results | boolean | false | Returns results as they are generated. |
keywords | List | Array of keyword strings to spot in the audio. | |
keywords_threshold | float | Confidence level lower bound to recognize a keyword. By default no keyword spotting. | |
max_alternatives | integer | 1 | Maximum number of alternatives to return. By default this is a single result. |
word_alternatives_threshold | float | Confidence level lower bound for identifying alternatives. By default no alternative words are found. | |
word_confidence | boolean | true | If true this returns the confidence of each word. |
timestamps | boolean | true | If true return timestamps for each word. |
profanity_filter | boolean | true | If true replaces all inappropriate words with asterisks. |
smart_formatting | boolean | false | If true the STT engine will convert various values into a more readable form. |
speaker_labels | boolean | false | If true the response will include labels identifying the individual speakers. |
grammar_name | string | Name of the grammar for the recognition. | |
redaction | boolean | false | If true all numerical data will be redacted from the result. |
processing_metrics | boolean | false | If true processing metrics are returned in the result. |
processing_metrics_interval | float | 1.0 | Interval, in seconds, to return processing metrics. |
audio_metrics | boolean | false | If true detailed information about signal characteristics of the audio are returned. |
end_of_phrase_silence_time | float | 0.8 | Duration of the pause interval for splitting a transcript into multiple results. |
split_transcript_at_phrase_end | boolean | false | If true directs the STT engine to split the transcript into multiple final results based on semantic features of the input. |
speech_detector_sensitivity | float | 0.5 | Sensitivity of speech activity detection. |
background_audio_suppression | float | 0.0 | Level to supress background audio. |
low_latency | boolean | false | If true attempts to return results quicker. |
character_insertion_bias | float | 0.0 | Bias between shorter or longer strings when generating the results. |
skip_zero_len_words | boolean | false | Undocumented feature. |