The Kelly and Gerstman program was quite simple. For each speech
sound, initial and final transition durations, steady-state
durations, and steady-state values for each parameter were stored.
During the steady state of a sound, the stored values were used;
during the final-initial transition period, parameter values changed
smoothly from preceding to following steady state. It would be easy
to criticize this scheme: the framework within which the rules are
stated is extremely crude, and a good deal of ad hoc modification
was required to make the synthetic speech even reasonably intelligible.
But Kelly and Gerstman had clearly demonstrated that a computer could
be used to apply phonetic rules -- as great an advance over application
of the rules by drawing patterns or functions by hand as were the
latter over direct operation of the synthesizer by a human being. It
was now possible to test and correct rules by producing substantial
quantities of synthetic speech automatically and consistently.
Resonance synthesizers, as well as the Playback and the Voder, are
'terminal analog' synthesizers: they simulate the acoustic output
of the vocal tract but not the activity of the vocal tract itself.
However, concurrently with resonance synthesizers, vocal tract analog
synthesizers were being developed. With one interesting exception,
the 'true' (i.e. mechanical) model of Ladefoged and Anthony (Anthony
1964), these synthesizers are electrical simulations. The supraglottal
vocal tract is considered as segmented into a series of short tubes,
each with variable cross-sectional area. The acoustic properties of
each tube in such a series can be simulated by the electrical
properties of a transmission line. The acoustical effect of a change
in cross-sectional area is equivalent to a change in the characteristic
impedance of the corresponding transmission-line segment. The nasal
cavity is usually represented as a branch with a few fixed sections
and variable coupling to the main line. Thus, the spectrum of the
output of the synthesizer depends on the momentary cross-sectional
area function and the amount of nasal coupling. 6
Like a resonance synthesizer, a vocal-tract analog could be simulated
on a computer, and Kelly and Lochbaum (1962), used such a simulation
for synthesis by rule. The approach was very much the same as the one
used by Kelly and Gerstman, except that the parameters were the areas
of the cross-sections of the segments of the tract instead of formant
frequencies. The results were less successful than Kelly's terminal-analog
synthesis had been; a fact of some interest.
__________
6. The first electrical vocal tract analogs were static, like
those of Dunn (1950), Stevens et al. (1953), Fant (1960). Rosen
(1958) built a dynamic vocal tract (DAVO), which Dennis (1963)
later attempted to control by computer. Dennis et al.
(1964), Hiki et al. (1968) and Baxter and Strong (1969) have
also described hardware vocal-tract analogs. Kelly and Lochbaurn
(1962) made the first computer simulation; later digital computer
simulations have been made, e.g. by Nakata and Mitsuoka (1965),
Matsui (1968) and Mermelstein (in press). Honda et al.
(1968) have made an analog computer simulation.
|