NMAH | Smithsonian Speech Synthesis History Project (dk

and Mattingly, 1985; Lisker, 1978). For example, Carlson et al. (1972) synthesized /g/ bursts with either a single compact energy concentration near F2, or F2 excitation plus a weak secondary energy concentration near F4 (the next front cavity resonance). They obtained best intelligibility scores using the more complicated burst that better matched natural bursts. Some cues are of course more powerful than others, but the listener appears to be responsive to an incredible number of acoustic details and performs best when the synthesis contains all known acoustic regularities (Dorman et al., 1977).

The early Klatt rules for segmental synthesis were augmented by a sentence-level phonological component (Klatt, 1976b) that derived segment durations, fo contour, and allophonic variation by rule (example 21 of the Appendix). The program has evolved over the last 10 years, and has spawned several progeny. The 1976 version was incorporated into the MITalk text-to-speech system that was being developed in the 1970s at M.I.T. under the guidance of Jonathan Allen (Allen et al., 1987). The fundamental frequency algorithm of Klatt (1976b) was replaced by one developed by O'Shaughnessy (1977). MITalk text analysis routines included a morpheme dictionary (Allen, 1976), letter-to-sound rules (Hunnicutt, 1976), and a phrase-level parser. The MITalk system evolved until 1979 when the project was terminated (Allen et al., 1979; Allen et al., 1987) (example 30 of the Appendix).

In 1976, the MITalk letter-to-phoneme rules (Hunnicutt, 1976) and the Klatt phoneme-to-speech program were licensed to Telesensory Systems, Inc. for incorporation into a reading machine for the blind (Goldhor and Lund, 1983). After considerable effort to transform the code into a real-time device, Telesensory Systems sold off their speech synthesis division to a newly formed company, Speech Plus, Inc. Following further development, Speech Plus came out with the Prose-2000 text-to-speech system in 1982 (Groner et al., 1982) (example 32 of the Appendix). Since that time, the segmental synthesis rules have been modified to improve intelligibility over limited bandwidth long-distance telephone lines (Wright et al., 1986). For example, some noise bursts and frication spectra were enhanced slightly with respect to normal levels in order to compensate for the frequency response and noise characteristics of the phone. Particular attention was paid to postvocalic consonants, where they found that adding very brief releases into a weak schwa-like element before silence ("man" = ) improved the

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use