If the compression_type is COMPRESSED_VOICE, each element is COMPRESSED_ELEMENT_LENGTH bytes.
If the compression_type is UNCOMPRESSED_VOICE, each element is UNCOMPRESSED_ELEMENT_LENGTH bytes.
Each element always corresponds to 0.2 seconds of voice data.