NMAH | Smithsonian Speech Synthesis History Project (im

The Kelly and Gerstman program was quite simple. For each speech sound, initial and final transition durations, steady-state durations, and steady-state values for each parameter were stored. During the steady state of a sound, the stored values were used; during the final-initial transition period, parameter values changed smoothly from preceding to following steady state. It would be easy to criticize this scheme: the framework within which the rules are stated is extremely crude, and a good deal of ad hoc modification was required to make the synthetic speech even reasonably intelligible. But Kelly and Gerstman had clearly demonstrated that a computer could be used to apply phonetic rules -- as great an advance over application of the rules by drawing patterns or functions by hand as were the latter over direct operation of the synthesizer by a human being. It was now possible to test and correct rules by producing substantial quantities of synthetic speech automatically and consistently.

Resonance synthesizers, as well as the Playback and the Voder, are 'terminal analog' synthesizers: they simulate the acoustic output of the vocal tract but not the activity of the vocal tract itself. However, concurrently with resonance synthesizers, vocal tract analog synthesizers were being developed. With one interesting exception, the 'true' (i.e. mechanical) model of Ladefoged and Anthony (Anthony 1964), these synthesizers are electrical simulations. The supraglottal vocal tract is considered as segmented into a series of short tubes, each with variable cross-sectional area. The acoustic properties of each tube in such a series can be simulated by the electrical properties of a transmission line. The acoustical effect of a change in cross-sectional area is equivalent to a change in the characteristic impedance of the corresponding transmission-line segment. The nasal cavity is usually represented as a branch with a few fixed sections and variable coupling to the main line. Thus, the spectrum of the output of the synthesizer depends on the momentary cross-sectional area function and the amount of nasal coupling. 6

Like a resonance synthesizer, a vocal-tract analog could be simulated on a computer, and Kelly and Lochbaum (1962), used such a simulation for synthesis by rule. The approach was very much the same as the one used by Kelly and Gerstman, except that the parameters were the areas of the cross-sections of the segments of the tract instead of formant frequencies. The results were less successful than Kelly's terminal-analog synthesis had been; a fact of some interest.
__________
6. The first electrical vocal tract analogs were static, like those of Dunn (1950), Stevens et al. (1953), Fant (1960). Rosen (1958) built a dynamic vocal tract (DAVO), which Dennis (1963) later attempted to control by computer. Dennis et al. (1964), Hiki et al. (1968) and Baxter and Strong (1969) have also described hardware vocal-tract analogs. Kelly and Lochbaurn (1962) made the first computer simulation; later digital computer simulations have been made, e.g. by Nakata and Mitsuoka (1965), Matsui (1968) and Mermelstein (in press). Honda et al. (1968) have made an analog computer simulation.

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use