SSSHP Contents | Labs

 KLATT 1987, p. 777 
Go to Page | Contents A. Isolated words | Index | Bibl. | Page- | Page+
 

save bits by quantizing the representation, and thus (2) linear-prediction coded speech often gives listeners more favorable impressions of intelligibility and naturalness than are warranted by objective measures.

Based on this critical evaluation, Olive went on to select new versions of his diphone inventory, also hand-correcting pitch errors, and retested diphone intelligibility iteratively until the most recent system exceeds the intelligibility of LP10. Part of the intelligibility increase may be attributed to the use of multipulse linear prediction (Atal and Remde, 1982; Olive and Liberman, 1985), which makes possible the detailed modeling of bursts of noise and other syllable-onset events.17

Wright et al. (1986) discovered that it is possible to detect deficiencies in segmental synthesis even when intelligibility is relatively high, simply by asking subjects to rate the subjective goodness of words. Naive listeners hear a word and then see a visual presentation of the word, at which point they are asked to rate goodness. If the goodness rating is low, the computer asks additional questions about the location and type of specific defects.

In an effort to find a maximally sensitive test for comparing phoneme intelligibility of various systems, Nixon et al. (1985) added controlled amounts of background noise to synthesized or vocoded MRT word lists, and measured intelligibility as a function of signal-to-noise level. They found that an unidentified "high-performance" text-to-speech system was about six percentage points worse than natural speech over a wide range of S/N ratios. Stated in another way, under adverse S/N conditions, the synthetic speech had to have a 5-dB boost in S/N ratio to be as intelligible as natural speech.18  Of perhaps greater interest are comparative figures from the Nixon study for 2.4-kbit government-standard LPC-10, and 9600-bit CVSD, both of which performed much worse than the synthetic speech produced by this text-to-speech system -- both being about 40% less intelligible than natural speech at high and low S/N ratios. These rather surprising results suggest limits to the utility of low-bit-rate encoded speech, and suggest that, at least for some applications, text-to-speech systems already offer superior communicative performance.

B. Intelligibility of words in sentences

In comparison with words spoken in isolation, words in sentences undergo significant coarticulation across word boundaries, phonetic simplifications, reduction of unstressed syllables, and prosodic modifications that, among other things, shorten nonfinal syllables and modify the fundamental frequency contour. In order to evaluate the ability of text-to-speech systems to realize these transformations, tests of word intelligibility in sentence frames have been devised. The easiest materials, consisting of simple short predictable sentences known as the CID sentences (Erber, 1979), have been used primarily to evaluate abilities of the hearing impaired. Another sentence list was devised to measure speech intelligibility in noise (Egan, 1948). This list, known as the Harvard sentences, is often employed today, in spite of its meager syntactic variation and minimal use of words with more than two syllables, simply because no better lists have been proposed and calibrated. Pisoni et al. (1985) employed a subset of the Harvard Sentences and measured the intelligibility of each content word. The results are presented in Table IX. The same rank order of systems holds as was obtained for isolated words. Also shown in the table are data from a Haskins anomalous sentence test (Nye and Gaitenby, 1974), consisting of nonsensical word strings that were syntactically acceptable -- of the form "The (adjective) (noun) (verb) the (noun)," e.g., "The old farm cost the blood." Again, system rank ordering is the same, but differences between systems are somewhat greater, suggesting that this is a more sensitive test.

The performance of the Haskins system, as evaluated by Ingemann (1978) and reported by Cooper et al. (1984) is also shown in the table. The poorer general performance of subjects in the Ingemann study on the natural speech control may imply that the scores should be boosted slightly before comparison with the systems listed above in the table.

Chial (1985) used the SPIN (speech in noise) test developed by Kalikow et al. (1977) and calibrated by Bilger et al. (1984) to evaluate the relative performance of several of the less expensive text-to-speech systems. Subjects had to identify the last word in sentences presented in a background babble of several competing voices. Included were the Echo II, the Votrax Type-n-Talk that incorporates the SC-01 synthesis-by-rule chip, and the Votrax Personal Speech System, which uses a new version of the chip, the SC-01A. Results, shown in Table X, indicate that the new chip has improved intelligibility over the SC-01, up from 40% to 65% words correct as measured at 0-dB signal-to-babble level. However, performance with natural speech at this signal-to-babble level is typically about 91% correct (Chial, 1985; Bilger et al., 1984), so one must conclude that these inexpensive devices are still very limited in intelligibility.

C. Reading comprehension

Since synthetic speech is less intelligible than natural speech, what happens when one tries to understand long paragraphs? Do listeners miss important information? Is a listener so preoccupied with decoding individual words that the message is quickly forgotten? In an attempt to answer these questions, Pisoni and Hunnicutt (1980) included a standard reading comprehension task in their evaluations. Half the subjects read the paragraphs by eye, while the other half listened to a text-to-speech system. In a later experiment, comparison was made with a human voice reading the
 

Go to Page | Contents C. Comprehension | Index | Bibl. | Page- | Page+

 KLATT 1987, p. 777 
SSSHP Contents | Labs
Smithsonian Speech Synthesis History Project
National Museum of American History | Archives Center
Smithsonian Institution | Privacy | Terms of Use