NMAH | Smithsonian Speech Synthesis History Project (dk

Copying the same sentence using the second generation of Gunnar Fant's OVE cascade formant synthesizer, 1962. Gunnar Fant attempted to match a natural recording using OVE II (Fant and Martony, 1962). Demonstrated at the 1962 Stockholm Speech Communication Conference. Compare with the PAT version of the same utterance, above. (text)
Comparison of synthesis and a natural sentence, using OVE II, by John Holmes, 1961. Holmes (1961) of the Joint Speech Unit of the British Post Office used the OVE II synthesizer to generate a close copy of a natural sentence. (text)
Comparison of synthesis and a natural sentence, John Holmes using his parallel formant synthesizer, 1973. Holmes did essentially the same thing in 1973, using a more complex parallel formant synthesizer of his own design (Holmes, 1973). Demonstrated at the 1972 IEEE Conference on Speech Communication and Processing, Boston. (text)
Attempt to scale the DECtalk male voice to make it sound female. The DECtalk "Perfect Paul" male voice has been modified by scaling fo by a factor of 1.7 (ap = 204, pr = 170), by scaling all formant frequencies by a factor of 0.85 (hs = 85) and removing the fifth formant (f5 = 2500, b5 = 2048), by increasing the open quotient of the glottal waveform using the "richness" variable (ri = 0), and by decreasing the output level slightly to avoid overloads (lo = 81). These manipulations are not sufficient to turn Paul into a convincing female speaker. (text)
Comparison of synthesis and a natural sentence, female voice, Dennis Klatt, 1986b. A synthetic copy of a female speaker producing (1) a sentence and (2) an utterance in which each syllable of "Steve eats candy cane" is replaced by is compared with the original recording (Klatt, 1986b). (text)
The DAVO articulatory synthesizer developed by George Rosen at M.I.T., 1958. The DAVO ("Dynamic Analog of the VOcal tract") circuit designed by Rosen (1958) at M.I.T., augmented by a nasal tract designed by Hecker (1962), was controlled by a tape recording of control signals created by hand by Kenneth Stevens and Arthur House. The demonstration occurred at the fall meeting of the Acoustical Society of America in 1961. (text)
Sentences produced by an articulatory model, James Flanagan and Kenzo Ishizaka, 1976. Flanagan and Ishizaka (1976) of the AT&T Bell Telephone Laboratories used an articulatory synthesizer to generate two sentences, using control data derived from the Coker et al. (1973) text-to-speech system. A two-mass model of the vocal cords was employed, and turbulence noise was injected automatically whenever the Reynolds number became large at the larynx, or at a constricted section of the vocal tract. (text)
Linear-prediction analysis and resynthesis of speech at a low-bit rate in the Texas Instruments Speak-'n-Spell toy, Richard Wiggins, 1980. Wiggins (1980) designed a low-cost linear-prediction synthesis chip to take advantage of the ability of linear prediction to represent critical spectral and temporal aspects of speech waveforms efficiently. (text)
Comparison of synthesis and a natural recording, automatic analysis-resynthesis using multipulse linear prediction, Bishnu Atal, 1982. Atal of the AT&T Bell Laboratories demonstrated a new formulation of linear prediction, known as multipulse LPC (Atal and Remde, 1982) at the 1982 Paris ICASSP. (text)

Part B: Segmental synthesis by rule

The first synthesis-by-rule programs concentrated on the development of rules for phonemic synthesis, and did not include rules for the automatic specification of phoneme durations and fundamental frequency. Since prosody was specified by hand to match a natural recording, these demonstrations sound significantly better than they would if all information had been derived by rule.

Creation of a sentence from rules in the head of Pierre Delattre, using the Haskins Pattern Playback, 1959. A stylized spectrogram of the desired sentence was painted on a transparent plastic plate by Pierre Delattre, and then played by the Haskins Pattern Playback. (text)
Output from the first computer-based phonemic synthesis-by-rule program, created by John Kelly and Louis Gerstman, 1961. Kelly and Gerstman (1961, 1962) of the AT&T Bell Laboratories demonstrated the first phonemic synthesis-by-rule program in 1961 at a meeting of the Acoustical Society of America. (text)
Elegant rule program for British English by John Holmes, Ignatius Mattingly, and John Shearme, 1964. Holmes et al. (1964) of the Joint Speech Research Unit in England demonstrated an impressive phonemic synthesis-by-rule program for British English at the fall meeting of the Acoustical Society of America in Ann Arbor, 1963. (text)
Formant synthesis using diphone concatenation, by Rex Dixon and David Maxey, 1968. Dixon and Maxey (1968) of IBM at Research Triangle Park demonstrated a diphone concatenation method for construction of control parameter time functions for a formant synthesizer at the 1967 M.I.T. Conference on Speech Communication and Processing. (text)
Rules to control a low-dimensionality articulatory model, by Cecil Coker, 1968. Coker (1968) of AT&T Bell Laboratories created a method of generating speech from an articulatory model. The system was demonstrated at the 1967 M.I.T. Conference on Speech Communication and Processing. (text)

Part C: Synthesis by rule of segments and sentence prosody

The next synthesis-by-rule programs include a complete set of rules for going from phonemes, stress marks, and some syntactic information to an output speech waveform.

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use