Intelligibility (communication): Difference between revisions
Amichaelnoll (talk | contribs) |
m linking |
||
(40 intermediate revisions by 27 users not shown) | |||
Line 1: | Line 1: | ||
{{Other uses|Intelligibility (disambiguation)}} |
{{Other uses|Intelligibility (disambiguation)}} |
||
In speech communication, '''intelligibility''' is a measure of how comprehensible speech is in given conditions. Intelligibility is affected by the quality of the speech signal, the type and level of background noise, reverberation, and, for speech over communication devices, the properties of the communication system. The concept of speech intelligibility is relevant to several fields, including [[phonetics]], [[human factors and ergonomics|human factors]], [[acoustical engineering]], and [[audiometry]]. |
In speech communication, '''intelligibility''' is a measure of how comprehensible speech is in given conditions. Intelligibility is affected by the level (loud but not too loud) and quality of the speech signal, the type and level of background noise, reverberation (some reflections but not too many), and, for speech over communication devices, the properties of the communication system. A common standard measurement for the quality of the intelligibility of speech is the [[Speech Transmission Index|Speech Transmission Index (STI)]]. The concept of speech intelligibility is relevant to several fields, including [[phonetics]], [[human factors and ergonomics|human factors]], [[acoustical engineering]], and [[audiometry]]. |
||
== Important Influences == |
|||
Speech is considered to be the major method of communication between humans. Humans alter the way they speak and hear according to many factors, like the age, gender, native language and social relationship between talker and listener. Speech intelligibility may also be affected by pathologies such as speech and hearing disorders.<ref>Fontan, L., Pellegrini, T., Olcoz, J., & Abad, A. (2015, September). [https://hal.archives-ouvertes.fr/hal-01316846/document Predicting disordered speech comprehensibility from Goodness of Pronunciation scores]. In Workshop on Speech and Language Processing for Assistive Technologies (SLPAT 2015) satellite workshop of Interspeech 2015 (pp. pp-1).</ref><ref>{{cite journal | doi=10.1044/2017_JSLHR-S-16-0269 | title=Automatic Speech Recognition Predicts Speech Intelligibility and Comprehension for Listeners with Simulated Age-Related Hearing Loss | year=2017 | last1=Fontan | first1=Lionel | last2=Ferrané | first2=Isabelle | last3=Farinas | first3=Jérôme | last4=Pinquier | first4=Julien | last5=Tardieu | first5=Julien | last6=Magnen | first6=Cynthia | last7=Gaillard | first7=Pascal | last8=Aumont | first8=Xavier | last9=Füllgrabe | first9=Christian | journal=Journal of Speech, Language, and Hearing Research | volume=60 | issue=9 | pages=2394–2405 | pmid=28793162 | s2cid=13849830 | doi-access=free }}</ref> |
|||
Finally, speech intelligibility is influenced by the environment or limitations on the communication channel. How well a spoken message can be understood in a room is influenced by the |
|||
*[[noise curve|background noise]], |
|||
*[[reverberation]], and |
|||
*[[frequency response]] of the room, as well as the |
|||
*[[sound pressure level]] and |
|||
*[[distortion]] of the [[sound reinforcement system]] |
|||
== Noise levels and reverberation == |
== Noise levels and reverberation == |
||
Intelligibility is negatively impacted by background noise and reverberation. The relationship between sound and noise levels is generally described in terms of a signal-to-noise ratio. With a |
Intelligibility is negatively impacted by background noise and too much reverberation. The relationship between sound and noise levels is generally described in terms of a signal-to-noise ratio. With a background noise level between 35 and 100 dB, the threshold for 100% intelligibility is usually a signal-to-noise ratio of 12 dB.<ref name="robinson-casali-2003">Robinson, G. S., and Casali, J. G. (2003). Speech communication and signal detection in noise. In E. H. Berger, L. H. Royster, J. D. Royster, D. P. Driscoll, and M. Layne (Eds.), ''The noise manual'' (5th ed.) (pp. 567-600). Fairfax, VA: American Industrial Hygiene Association.</ref> 12 dB means that the signal should be roughly 4 times louder than the background noise. The speech signal ranges from about 200–8000 Hz, while human hearing ranges from about 20-20,000 Hz, so the effects of masking depend on the frequency range of the masking noise. Additionally, different speech sounds make use of different parts of the speech frequency spectrum, so a continuous background noise such as [[white noise|white]] or [[pink noise]] will have a different effect on intelligibility than a variable or modulated background noise such as competing speech, multi-talker or "cocktail party" babble, or industrial machinery. |
||
Reverberation also affects the speech signal by blurring speech sounds over time. This has the effect of enhancing vowels with steady states, while masking stops, glides and vowel transitions, and prosodic cues such as pitch and duration.<ref name="garcia-lecumberri-2010">Garcia Lecumberri |
Reverberation also affects the speech signal by blurring speech sounds over time. This has the effect of enhancing vowels with steady states, while masking stops, glides and vowel transitions, and prosodic cues such as pitch and duration.<ref name="garcia-lecumberri-2010">{{cite journal | last1 = Garcia Lecumberri | first1 = M. L. | last2 = Cooke | first2 = M. | last3 = Cutler | first3 = A. | year = 2010 | title = Non-native speech perception in adverse conditions: A review | journal = Speech Communication | volume = 52 | issue = 11–12| pages = 864–886 | doi = 10.1016/j.specom.2010.08.014 | hdl = 11858/00-001M-0000-0012-BE5A-C | s2cid = 8723075 | hdl-access = free }}</ref> |
||
The fact that background noise compromises intelligibility is exploited in [[audiometry|audiometric testing]] involving spoken speech and some [[linguistics|linguistic]] perception experiments as a way to compensate for the [[ceiling effect (statistics)|ceiling effect]] by making listening tasks more difficult. |
The fact that background noise compromises intelligibility is exploited in [[audiometry|audiometric testing]] involving spoken speech and some [[linguistics|linguistic]] perception experiments as a way to compensate for the [[ceiling effect (statistics)|ceiling effect]] by making listening tasks more difficult. |
||
Line 18: | Line 30: | ||
! Good values |
! Good values |
||
|- |
|- |
||
| [[Speech Transmission Index|STI]] |
|||
| [[Speech Transmission Index|STI]]<ref>[http://www.nti-audio.com/Portals/0/Products/Exel/XL2/Downloads/NTi_Audio_AppNote_STI-PA_Measurement.pdf Speech Intelligibility Measurement Methods]</ref> |
|||
| Intelligibility ( |
| Intelligibility (internationally known) |
||
| > 0.6 |
| > 0.6 |
||
|- |
|- |
||
| [[Common Intelligibility Scale|CIS]] |
| [[Common Intelligibility Scale|CIS]] |
||
| Intelligibility ( |
| Intelligibility (internationally known) |
||
| > 0.78 |
| > 0.78 |
||
|- |
|- |
||
Line 32: | Line 44: | ||
| C50 |
| C50 |
||
| Clarity index (widespread in Germany) |
| Clarity index (widespread in Germany) |
||
| > 3 |
| > 3 dB |
||
|- |
|- |
||
| RASTI (obsolete) |
| RASTI (obsolete) |
||
| Intelligibility ( |
| Intelligibility (internationally known) |
||
| > 0.6 |
| > 0.6 |
||
|} |
|} |
||
Word articulation remains high even when only 1–2% of the wave is unaffected by distortion |
Word articulation remains high even when only 1–2% of the wave is unaffected by distortion.<ref>Moore, C.J. (1997). An introduction to the psychology of hearing. Academic Press. 4th ed. Academic Press. London. {{ISBN|978-0-12-505628-1}}</ref> |
||
==Intelligibility with different types of speech== |
==Intelligibility with different types of speech== |
||
Line 55: | Line 67: | ||
| pmid = 8423266 |
| pmid = 8423266 |
||
| doi=10.1121/1.405631 |
| doi=10.1121/1.405631 |
||
| bibcode = 1993ASAJ...93..510J |
|||
}}</ref><ref name="Summers">{{Cite journal |
}}</ref><ref name="Summers">{{Cite journal |
||
| last1 = Summers | first1 = W. V. |
| last1 = Summers | first1 = W. V. |
||
Line 68: | Line 81: | ||
| year = 1988 |
| year = 1988 |
||
| pmid = 3183209 | doi=10.1121/1.396660 | pmc=3507387 |
| pmid = 3183209 | doi=10.1121/1.396660 | pmc=3507387 |
||
| bibcode = 1988ASAJ...84..917S |
|||
}} [http://129.237.66.221/P800/Summers1988.pdf PDF]</ref> |
|||
}} [http://129.237.66.221/P800/Summers1988.pdf PDF] {{Webarchive|url=https://web.archive.org/web/20160304141743/http://129.237.66.221/P800/Summers1988.pdf |date=2016-03-04 }}</ref> |
|||
===Screaming=== |
===Screaming=== |
||
[[screaming|Shouted speech]] is less intelligible than Lombard speech because increased vocal energy produces decreased phonetic information.<ref>{{Cite journal | last1 = Pickett | first1 = J. M. | title = Effects of Vocal Force on the Intelligibility of Speech Sounds | doi = 10.1121/1.1908510 | journal = The Journal of the Acoustical Society of America | volume = 28 | issue = 5 | pages = |
[[screaming|Shouted speech]] is less intelligible than Lombard speech because increased vocal energy produces decreased phonetic information.<ref>{{Cite journal | last1 = Pickett | first1 = J. M. | title = Effects of Vocal Force on the Intelligibility of Speech Sounds | doi = 10.1121/1.1908510 | journal = The Journal of the Acoustical Society of America | volume = 28 | issue = 5 | pages = 902–905| year = 1956 | bibcode = 1956ASAJ...28..902P }}</ref> However, "infinite peak clipping of shouted speech makes it almost as intelligible as normal speech."<ref>[http://noll.uscannenberg.org/ScannedPapers/Shouted_Speech_1969.pdf MacLean, Donald J. & A. Michael Noll, "The Intelligibility of Shouted Speech," Proceedings of the Symposium on the Aeromedical Aspects of Radio Communication and Flight Safety, AGARD/NATO Advisory Report 19, pp. 10-1 to 10-13 (December 1969 London)]</ref> |
||
===Clear speech=== |
===Clear speech=== |
||
[[Clear speech]] is used when talking to a person with a [[hearing impairment]]. It is characterized by a slower speaking rate, more and longer pauses, elevated speech intensity, increased word duration, "targeted" vowel formants, increased consonant intensity compared to |
[[Clear speech]] is used when talking to a person with a [[hearing impairment]]. It is characterized by a slower speaking rate, more and longer pauses, elevated speech intensity, increased word duration, "targeted" vowel formants, increased consonant intensity compared to adjacent vowels, and a number of phonological changes (including fewer reduced vowels and more released stop bursts).<ref>{{Cite journal |
||
| last1 = Picheny | first1 = M. A. |
| last1 = Picheny | first1 = M. A. |
||
| last2 = Durlach | first2 = N. I. |
| last2 = Durlach | first2 = N. I. |
||
| last3 = Braida | first3 = L. D. |
| last3 = Braida | first3 = L. D. |
||
| title = Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech |
| title = Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech |
||
| journal = Journal of |
| journal = Journal of Speech and Hearing Research |
||
| volume = 28 |
| volume = 28 |
||
| issue = 1 |
| issue = 1 |
||
Line 85: | Line 99: | ||
| year = 1985 |
| year = 1985 |
||
| pmid = 3982003 |
| pmid = 3982003 |
||
| doi = 10.1044/jshr.2801.96 |
|||
}}</ref><ref>{{Cite journal |
}}</ref><ref>{{Cite journal |
||
| last1 = Picheny | first1 = M. A. |
| last1 = Picheny | first1 = M. A. |
||
Line 90: | Line 105: | ||
| last3 = Braida | first3 = L. D. |
| last3 = Braida | first3 = L. D. |
||
| title = Speaking clearly for the hard of hearing. II: Acoustic characteristics of clear and conversational speech |
| title = Speaking clearly for the hard of hearing. II: Acoustic characteristics of clear and conversational speech |
||
| journal = Journal of |
| journal = Journal of Speech and Hearing Research |
||
| volume = 29 |
| volume = 29 |
||
| issue = 4 |
| issue = 4 |
||
Line 100: | Line 115: | ||
===Infant-directed speech=== |
===Infant-directed speech=== |
||
Infant-directed speech—or [[ |
Infant-directed speech—or [[baby talk]]—uses a simplified [[syntax]] and a small and easier-to-understand [[vocabulary]] than speech directed to adults<ref>Snow CE. Ferguson CA. (1977). Talking to Children: Language Input and Acquisition, Cambridge University Press. {{ISBN|978-0-521-29513-0}}</ref> Compared to adult directed speech, it has a higher fundamental frequency, exaggerated pitch range, and slower rate.<ref>{{Cite journal |
||
| last1 = Kuhl | first1 = P. K. |
| last1 = Kuhl | first1 = P. K. |
||
| last2 = Andruski | first2 = J. E. |
| last2 = Andruski | first2 = J. E. |
||
Line 117: | Line 132: | ||
| year = 1997 |
| year = 1997 |
||
| pmid = 9235890 | doi=10.1126/science.277.5326.684 |
| pmid = 9235890 | doi=10.1126/science.277.5326.684 |
||
| s2cid = 32048191 |
|||
}}</ref> |
}}</ref> |
||
===Citation speech=== |
===Citation speech=== |
||
Citation speech occurs when people engage [[self-conscious]]ly in spoken language research. It has a slower tempo and fewer connected speech processes (e.g., shortening of nuclear vowels, devoicing of word-final consonants) than normal speech.<ref name="Johnson">{{Cite journal |year=1993|title=The hyperspace effect: Phonetic targets are hyperarticulated|journal= Language|volume= 69|pages=505–28|jstor=416697 | |
Citation speech occurs when people engage [[self-conscious]]ly in spoken language research. It has a slower [[Speech tempo|tempo]] and fewer connected speech processes (e.g., shortening of nuclear vowels, devoicing of word-final consonants) than normal speech.<ref name="Johnson">{{Cite journal |year=1993|title=The hyperspace effect: Phonetic targets are hyperarticulated|journal= Language|volume= 69|pages=505–28|jstor=416697 |vauthors=Johnson K, Flemming E, Wright R |doi=10.2307/416697 |issue=3}}</ref> |
||
=== Hyperspace speech === |
=== Hyperspace speech === |
||
Hyperspace speech, also known as the hyperspace effect, occurs when people are misled about the presence of environment noise. It involves modifying the F1 and F2 of phonetic vowel targets to ease perceived difficulties on the part of the listener in recovering information from the acoustic signal.<ref name="Johnson"/> |
Hyperspace speech, also known as the hyperspace effect, occurs when people are misled about the presence of environment noise. It involves modifying the [[formant]]s F1 and F2 of phonetic vowel targets to ease perceived difficulties on the part of the listener in recovering information from the acoustic signal.<ref name="Johnson"/> |
||
==Notes== |
==Notes== |
||
Line 135: | Line 151: | ||
[[Category:Sound]] |
[[Category:Sound]] |
||
[[Category:Hearing]] |
[[Category:Hearing]] |
||
[[Category: |
[[Category:Interpersonal communication]] |
Latest revision as of 00:17, 16 March 2024
In speech communication, intelligibility is a measure of how comprehensible speech is in given conditions. Intelligibility is affected by the level (loud but not too loud) and quality of the speech signal, the type and level of background noise, reverberation (some reflections but not too many), and, for speech over communication devices, the properties of the communication system. A common standard measurement for the quality of the intelligibility of speech is the Speech Transmission Index (STI). The concept of speech intelligibility is relevant to several fields, including phonetics, human factors, acoustical engineering, and audiometry.
Important Influences
[edit]Speech is considered to be the major method of communication between humans. Humans alter the way they speak and hear according to many factors, like the age, gender, native language and social relationship between talker and listener. Speech intelligibility may also be affected by pathologies such as speech and hearing disorders.[1][2]
Finally, speech intelligibility is influenced by the environment or limitations on the communication channel. How well a spoken message can be understood in a room is influenced by the
- background noise,
- reverberation, and
- frequency response of the room, as well as the
- sound pressure level and
- distortion of the sound reinforcement system
Noise levels and reverberation
[edit]Intelligibility is negatively impacted by background noise and too much reverberation. The relationship between sound and noise levels is generally described in terms of a signal-to-noise ratio. With a background noise level between 35 and 100 dB, the threshold for 100% intelligibility is usually a signal-to-noise ratio of 12 dB.[3] 12 dB means that the signal should be roughly 4 times louder than the background noise. The speech signal ranges from about 200–8000 Hz, while human hearing ranges from about 20-20,000 Hz, so the effects of masking depend on the frequency range of the masking noise. Additionally, different speech sounds make use of different parts of the speech frequency spectrum, so a continuous background noise such as white or pink noise will have a different effect on intelligibility than a variable or modulated background noise such as competing speech, multi-talker or "cocktail party" babble, or industrial machinery.
Reverberation also affects the speech signal by blurring speech sounds over time. This has the effect of enhancing vowels with steady states, while masking stops, glides and vowel transitions, and prosodic cues such as pitch and duration.[4]
The fact that background noise compromises intelligibility is exploited in audiometric testing involving spoken speech and some linguistic perception experiments as a way to compensate for the ceiling effect by making listening tasks more difficult.
Intelligibility standards
[edit]Quantity to be measured | Unit of measurement | Good values |
---|---|---|
STI | Intelligibility (internationally known) | > 0.6 |
CIS | Intelligibility (internationally known) | > 0.78 |
%Alcons | Articulation loss (popular in USA) | < 10% |
C50 | Clarity index (widespread in Germany) | > 3 dB |
RASTI (obsolete) | Intelligibility (internationally known) | > 0.6 |
Word articulation remains high even when only 1–2% of the wave is unaffected by distortion.[5]
Intelligibility with different types of speech
[edit]Lombard speech
[edit]The human brain automatically changes speech made in noise through a process called the Lombard effect. Such speech has increased intelligibility compared to normal speech. It is not only louder but the frequencies of its phonetic fundamental are increased and the durations of its vowels are prolonged. People also tend to make more noticeable facial movements.[6][7]
Screaming
[edit]Shouted speech is less intelligible than Lombard speech because increased vocal energy produces decreased phonetic information.[8] However, "infinite peak clipping of shouted speech makes it almost as intelligible as normal speech."[9]
Clear speech
[edit]Clear speech is used when talking to a person with a hearing impairment. It is characterized by a slower speaking rate, more and longer pauses, elevated speech intensity, increased word duration, "targeted" vowel formants, increased consonant intensity compared to adjacent vowels, and a number of phonological changes (including fewer reduced vowels and more released stop bursts).[10][11]
Infant-directed speech
[edit]Infant-directed speech—or baby talk—uses a simplified syntax and a small and easier-to-understand vocabulary than speech directed to adults[12] Compared to adult directed speech, it has a higher fundamental frequency, exaggerated pitch range, and slower rate.[13]
Citation speech
[edit]Citation speech occurs when people engage self-consciously in spoken language research. It has a slower tempo and fewer connected speech processes (e.g., shortening of nuclear vowels, devoicing of word-final consonants) than normal speech.[14]
Hyperspace speech
[edit]Hyperspace speech, also known as the hyperspace effect, occurs when people are misled about the presence of environment noise. It involves modifying the formants F1 and F2 of phonetic vowel targets to ease perceived difficulties on the part of the listener in recovering information from the acoustic signal.[14]
Notes
[edit]- ^ Fontan, L., Pellegrini, T., Olcoz, J., & Abad, A. (2015, September). Predicting disordered speech comprehensibility from Goodness of Pronunciation scores. In Workshop on Speech and Language Processing for Assistive Technologies (SLPAT 2015) satellite workshop of Interspeech 2015 (pp. pp-1).
- ^ Fontan, Lionel; Ferrané, Isabelle; Farinas, Jérôme; Pinquier, Julien; Tardieu, Julien; Magnen, Cynthia; Gaillard, Pascal; Aumont, Xavier; Füllgrabe, Christian (2017). "Automatic Speech Recognition Predicts Speech Intelligibility and Comprehension for Listeners with Simulated Age-Related Hearing Loss". Journal of Speech, Language, and Hearing Research. 60 (9): 2394–2405. doi:10.1044/2017_JSLHR-S-16-0269. PMID 28793162. S2CID 13849830.
- ^ Robinson, G. S., and Casali, J. G. (2003). Speech communication and signal detection in noise. In E. H. Berger, L. H. Royster, J. D. Royster, D. P. Driscoll, and M. Layne (Eds.), The noise manual (5th ed.) (pp. 567-600). Fairfax, VA: American Industrial Hygiene Association.
- ^ Garcia Lecumberri, M. L.; Cooke, M.; Cutler, A. (2010). "Non-native speech perception in adverse conditions: A review". Speech Communication. 52 (11–12): 864–886. doi:10.1016/j.specom.2010.08.014. hdl:11858/00-001M-0000-0012-BE5A-C. S2CID 8723075.
- ^ Moore, C.J. (1997). An introduction to the psychology of hearing. Academic Press. 4th ed. Academic Press. London. ISBN 978-0-12-505628-1
- ^ Junqua, J. C. (1993). "The Lombard reflex and its role on human listeners and automatic speech recognizers". The Journal of the Acoustical Society of America. 93 (1): 510–524. Bibcode:1993ASAJ...93..510J. doi:10.1121/1.405631. PMID 8423266.
- ^ Summers, W. V.; Pisoni, D. B.; Bernacki, R. H.; Pedlow, R. I.; Stokes, M. A. (1988). "Effects of noise on speech production: Acoustic and perceptual analyses". The Journal of the Acoustical Society of America. 84 (3): 917–928. Bibcode:1988ASAJ...84..917S. doi:10.1121/1.396660. PMC 3507387. PMID 3183209. PDF Archived 2016-03-04 at the Wayback Machine
- ^ Pickett, J. M. (1956). "Effects of Vocal Force on the Intelligibility of Speech Sounds". The Journal of the Acoustical Society of America. 28 (5): 902–905. Bibcode:1956ASAJ...28..902P. doi:10.1121/1.1908510.
- ^ MacLean, Donald J. & A. Michael Noll, "The Intelligibility of Shouted Speech," Proceedings of the Symposium on the Aeromedical Aspects of Radio Communication and Flight Safety, AGARD/NATO Advisory Report 19, pp. 10-1 to 10-13 (December 1969 London)
- ^ Picheny, M. A.; Durlach, N. I.; Braida, L. D. (1985). "Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech". Journal of Speech and Hearing Research. 28 (1): 96–103. doi:10.1044/jshr.2801.96. PMID 3982003.
- ^ Picheny, M. A.; Durlach, N. I.; Braida, L. D. (1986). "Speaking clearly for the hard of hearing. II: Acoustic characteristics of clear and conversational speech". Journal of Speech and Hearing Research. 29 (4): 434–446. doi:10.1044/jshr.2904.434. PMID 3795886.
- ^ Snow CE. Ferguson CA. (1977). Talking to Children: Language Input and Acquisition, Cambridge University Press. ISBN 978-0-521-29513-0
- ^ Kuhl, P. K.; Andruski, J. E.; Chistovich, I. A.; Chistovich, L. A.; Kozhevnikova, E. V.; Ryskina, V. L.; Stolyarova, E. I.; Sundberg, U.; Lacerda, F. (1997). "Cross-language analysis of phonetic units in language addressed to infants". Science. 277 (5326): 684–686. doi:10.1126/science.277.5326.684. PMID 9235890. S2CID 32048191.
- ^ a b Johnson K, Flemming E, Wright R (1993). "The hyperspace effect: Phonetic targets are hyperarticulated". Language. 69 (3): 505–28. doi:10.2307/416697. JSTOR 416697.