Perceptual Speech Quality Measure

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Perceptual Speech Quality Measure (PSQM) is a computational and modeling algorithm defined in Recommendation ITU-T P.861 that objectively evaluates and quantifies voice quality of voice-band (300 – 3400 Hz) speech codecs. It may be used to rank the performance of these speech codecs with differing speech input levels, talkers, bit rates and transcodings. P.861 was withdrawn and replaced by Recommendation ITU-T P.862 (PESQ), which contains an improved speech assessment algorithm.

Why it is used

Using the PSQM standard allows automated, simulation-based test methodologies to objectively rate both speech clarity and transmitted voice quality. Various software and/or hardware products have been developed to facilitate this testing. This results in considerable savings in cost and time over the traditional practice of using large groups of people to subjectively evaluate voice signals and assess voice quality. Moreover, it yields objective results that are reliable and reproducible. This is very important to telephony providers who are mandated to maintain high quality-of-service standards.

Algorithm

PSQM uses a psychoacoustical mathematical modeling (both perceptual and cognitive) algorithm to analyze the pre and post transmitted voice signals, yielding a PSQM value which is a measure of signal quality degradation and ranges from 0 (no degradation) to 6.5 (highest degradation). In turn, this result may be translated into a mean opinion score (MOS), which is an accepted measure of the perceived quality of received media on a numeric scale ranging from 1 to 5. A value of 1 indicates unacceptable, poor quality voice while a value of 5 indicates high voice quality with no perceptible issues.

The PSQM algorithm converts the physical-domain signal(s) into the perceptually meaningful psychoacoustic domain through a series of nonlinear processes such as time-frequency mapping, frequency warping and intensity warping.

The quality of the coded speech is judged on the differences in the internal representation. The difference is used for the calculation of the noise disturbance as a function of time and frequency. Besides perceptual modeling, the PSQM algorithm uses cognitive modeling such as loudness scaling and asymmetric masking in order to get high correlations between subjective and objective measurements.

Limitations

PSQM as originally conceived was not developed to account for network quality of service perturbations common in Voice over IP applications, items such as packet loss, delay variance (jitter) or non-sequential packets. These conditions usually give inappropriate results under heavy network load simulations, failing to account for a very real perceived loss of voice quality. Attempts to duplicate network fault conditions by introducing significant packet loss result in PSQM values that correspond to falsely inflated MOS values.

In order to overcome this limitation, PSQM+ was developed by modifying the original algorithm. PSQM+ generates results that seem to more accurately reflect the adverse performance of speech codecs under realistic network load conditions.

Other considerations

Other issues involve the lack of standardization in test signals used to evaluate various speech codecs. PSQM provides more reliable and consistent MOS values if used in accordance with ITU recommended methods for objective and subjective assessment of quality (ITU-T P.800/P.830/P.861). These ITU-T Recommendations include using both male and female gender voice reference signals at an average level of −20 dB^{[clarification needed]}. The type, gender, duration, gain of the voice or signal can all have a minor impact on the PSQM value or MOS score as does the threshold levels, number of calls made and other configuration settings of the environment. When comparing voice quality measurements the signal, environment and configurations should all be taken into account.

Many speech codecs exist and are used in a wide variety of applications. Careful selection of appropriate speech codec(s) is necessary to match system requirements. A list of common speech codecs and their associated PSQM/PSQM+ derived MOS values obtained under various network load conditions is available.

References

ITU-T Recommendation P.861 (withdrawn): Objective quality measurement of telephone-band (300–3400 Hz) speech codecs. P.861 was recognized as having certain limitations in specific areas of application. It was replaced by P.862, which contains an improved objective speech quality assessment algorithm.
ITU-T Recommendation P.862 (2001-02): Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs
"AES Journal Forum » A Perceptual Speech-Quality Measure Based on a Psychoacoustic Sound Representation". secure.aes.org. Retrieved 2024-04-18.

Why it is used

Algorithm

Limitations

Other considerations

References

See also