NMAH | Smithsonian Speech Synthesis History Project (dk

First prosodic synthesis by rule, by Ignatius Mattingly, 1968. The synthesis-by-rule program of Mattingly (1966; 1968) of the Haskins Laboratories was demonstrated to accompany his Ph.D. thesis. (text)
Sentence-level phonology incorporated in rules by Dennis Klatt, 1976. Klatt (1976b) of the M.I.T. Speech Communication Group created a phonological component to generate segmental durations and a fundamental frequency contour, as well as sentence-level allophonic variation, from a phonemic input augmented with stress and syntactic symbols. (text)
Concatenation of linear-prediction diphones, by Joe Olive, 1977. Olive (1977) of AT&T Bell Laboratories controlled a linear-prediction synthesizer from stored reflection coefficients for a set of diphones. The system was demonstrated at ICASSP-77. The recording is from about 1980, and includes prosodic rules provided by Liberman and Pierrehumbert. (text)
Concatenation of linear-prediction demisyllables, by Cathrine Browman, 1980. A synthesis-by-rule program with prosodic rules, called Lingua, was designed by Browman (1980) of AT&T Bell Laboratories, using the demisyllable inventory collected by Fujimura and Lovins (1978). Demonstrated at ICASSP-80. (text)

Part D: Fully automatic text-to-speech conversion

The first full text-to-speech system, done in Japan by Noriko Umeda et al., 1968. The first demonstrated text-to-speech system for English was created by Umeda et al. (1968) of the Electrotechnical Laboratory in Japan, and was based on an articulatory model. It included a syntactic analysis module with sophisticated heuristics. Demonstrated at the 6th International Congress on Acoustics, in Tokyo in 1968. (text)
The first Bell Laboratories text-to-speech system, by Cecil Coker, Noriko Umeda, and Cathrine Browman, 1973. Coker et al. (1973) of AT&T Bell Laboratories demonstrated a text-to-speech program based on the Coker (1967) articulatory model. [Ed: see Coker (1968)] The system was demonstrated at the 1972 International Conference of Speech Communication and Processing in Boston. (text)
The Haskins Laboratories text-to-speech system, 1973. The Haskins Laboratories text-to-speech system (Cooper et al., 1973) used the Mattingly (1968) phoneme-to-speech rules coupled with a large dictionary. (text)
The Kurzweil reading machine for the blind, Raymond Kurzweil, 1976. Kurzweil (1976) began selling a reading machine with an optical scanner in the late 1970s. The system was demonstrated on the CBS evening news. (text)
The inexpensive Votrax Type-n-Talk system, by Richard Gagnon, 1978. The Votrax low-cost Type-n-Talk text-to-speech system combines a single-chip synthesis-by-rule program and formant synthesizer (Gagnon, 1978) with a version of the Elovitz et al. (1976) letter-to-sound rules. It was demonstrated at the 1978 ICASSP Conference. (text)
The Echo low-cost diphone concatenation system, about 1982. The Echo low-cost text-to-speech system concatenates linear-prediction diphones using the Texas Instrument's TMS-5220 linear prediction synthesizer chip. (text)
The M.I.T. MITalk system, by Jonathan Allen, Sheri Hunnicutt, and Dennis Klatt, 1979. The MITalk-79 laboratory text-to-speech system, developed at the Massachusetts Institute of Technology by Allen et al. (1979, 1987) and many others. The system was demonstrated in its final form at the 1979 meeting of the Acoustical Society of America in Boston. (text)
The multi-language Infovox system, by Rolf Carlson, Bjorn Granström, and Sheri Hunnicutt, 1982. The Infovox commercial text-to-speech system (Magnesson et al., 1984) is an implementation of the Carlson et al. (1982a) multilanguage system that was developed at the Royal Institute of Technology in Stockholm by Rolf Carlson et al. Versions of the system were demonstrated in 1976 and 1982 at ICASSP conferences. (text)
The Speech Plus Inc. "Prose-2000" commercial system, 1982. The Prose-2000 commercial text-to-speech system was first developed in conjunction with a reading machine for the blind project at Telesensory Systems by James Bliss and his associates (Goldhor and Lund, 1983; Groner et al., 1982). The recording is of Version 3.0 of the software. (text)
The Klattalk system, by Dennis Klatt of M.I.T. which formed the basis for Digital Equipment Corporation's DECtalk commercial system, 1983. The Klattalk (1982a) laboratory text-to-speech system software was licensed to Digital Equipment Corporation as a basis for the commercial DECtalk text-to-speech system announced in 1983. The recording is of Version 3.0 of the DECtalk software. (text)
The AT&T Bell Laboratories text-to-speech system, 1985. A new AT&T Bell Laboratories laboratory text-to-speech system (Olive and Liberman, 1985) uses the Olive (1977) diphone synthesis strategy in combination with a large morpheme dictionary (Coker, 1985) and letter-to-sound rules (Church, 1985). The laboratory system was demonstrated at a 1985 meeting of the Acoustical Society of America. (text)
Several of the DECtalk voices. Examples of some of the voices provided by the DECtalk text-to-speech system: (1) Beautiful Betty, (2) Huge Harry, (3) Kit the Kid, (4) Whispering Wendy. (text)
DECtalk speaking at about 300 words/minute. Example of using the DECtalk speaking rate command to skim material at a rapid rate. The nominal speaking rate has been set to 350 words/min, [:ra 350], although this 51-word passage took 11 s to speak, indicating an effective rate slightly under 300 words/min. (text)

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use