High-quality voice data

Sampling rate

All voice segments stored in the Blueworx Voice Response database use an 8 kHz sampling rate, consistent with standards used for telephony transmission. The Voice Segment window lets you digitally input data from other sources, but converts it to 8 kHz if necessary. There is no advantage to using sampling rates other than 8 kHz when recording new voice segments using the Voice Segment window. Similarly, the command line utilities, bvi_aiff and bvi_wav, convert any sampling rate greater than 8 kHz to the required 8 kHz rate.

Source format

Use the best-quality source for your voice segments and import these into Blueworx Voice Response in 16-bit PCM (linear) format at an 8 kHz sampling rate. To do this, use studio-quality DAT tape through the line-in of the Ultimedia adapter with the Ultimedia format set to 16-bit PCM. Alternatively, you may already have 16-bit PCM voice segments as files that can be imported directly into the Voice Segment Editor. The editor can change sampling rates are required, although slight distortion will usually result from a change in sampling rate. You should therefore always use an 8 kHz sampling rate for imported voice data if possible.

Dynamic range

When using the voice segment editor or the batch voice input utility to record voice segments via the Ultimedia adapter with an audio source connected to the its line input, you may find that the audio signal is relatively small compared to the available ‘dynamic range’. 16-bit PCM allows signal levels of up to 32K, whereas typical input signals from the Ultimedia adapter may have an amplitude of around 2K. When using 5:1 compression, the best quality is obtained if the input signal occupies as much of the 32K range as possible without signal peaks exceeding the available limits. This can be done with an external preamplifier or by using the MAXIMIZE option of the voice segment editor or batch voice input utility which digitally scales the input signal to occupy 90% of the full range.

Note that the maximize button of the voice segment editor is only enabled when operating in 16-bit PCM mode.

Filters

When you record a high-quality input signal for use over the telephone, it is necessary to filter out all frequencies above 4 kHz to allow transmission at the digital 8 kHz rate. (The voice segment editor does this automatically when it stores the segment in the database.) Loss of these high frequencies can make the signal sound relatively dull. You can improve this by using the Boost button of the voice segment editor before saving the recorded segment. This increases the volume of frequencies in the range 1.5 kHz to 4 kHz by 2 dB, and decreases the volume of frequencies in the range 500 Hz to 1.5 kHz by 2 dB. An identical effect can be achieved with the “Boost” option of the batch voice input utility where the boost amount can be set to any value.

Note that the boost button of the voice segment editor is only enabled when operating in 16-bit PCM mode.

Recording directly using a microphone

A direct microphone input can provide excellent quality input. However, the pSeries computer must be within 10-15 feet (maximum) of the microphone in order to minimize electrical noise pick-up. This may be difficult to achieve in a studio environment because fan and disk noise prohibit the pSeries computer from being in the same room as the microphone.

Using a recording studio

For the best results when recording voice segments, keep to the following rules:

If you are working with a studio which has reasonably sophisticated audio processing capabilities, it is wise to apply the audio boost function at source rather than with batch voice import utility. The best frequency-shaping function to apply is defined in the ITU P-Series Blue Book (Volume 5 1988) in Supplement No. 10 (P332). This is the preferred response for a telephone microphone as determined by user trials, and can be applied to flat-spectrum audio, achieving the same results as if the voice was being spoken through a telephone.

The frequency shaping function recommended by the ITU boosts the treble and cuts the bass in a signal in order to restore some of the brightness lost when a full-bandwidth audio signal is low-pass filtered at 3400 Hz prior to sampling at 8 kHz and is similar to the BOOST option of the voice segment editor or the batch voice import utility. Be sure that the shaping is not done both in the studio and by one or other of Blueworx Voice Response’s voice utilities.

The ITU-recommended frequency response characteristic is as follows:

Responses for spot frequencies are shown in Table 1.

Table 1. Responses for spot frequencies

Frequency

Response attenuated or amplified by

50 Hz

-20 dB

100 Hz

-12 dB

200 Hz

-4.5 dB

400 Hz

-2 dB

800 Hz

-1 dB

1000 Hz

0 dB

1500 Hz

+2.5 dB

2000 Hz

+6 dB

2500 Hz

+7 dB

3000 Hz

+6 dB

3400 Hz

0 dB

To get the best results when recording data for use as background music: