SSSHP Contents | Labs

 KLATT 1987, p. 748 
Go to Page | Contents A. Early synthesizers | Index | Bibl. | Page- | Page+
 

were capable of synthesizing intelligible speech (example 11 of the Appendix).

Modern improved simulations of an articulatory vocal tract have been concerned with the incorporation of frequency-dependent loss terms, provision for cavity wall motion at low frequencies, and better modeling of the time-varying termination impedance at the glottis (Flanagan et al., 1975; Liljencrants, 1985).

The first articulatory synthesizers used a glottal waveform consisting of a sawtooth current source. The voicing source has traditionally been described as a current source because the volume velocity waveform was said to depend very little on the shape or impedance of the vocal tract, at least for vowels (Fant, 1960; Flanagan, 1972). Efforts to improve upon this source model initially focused on obtaining a better approximation to the vibration pattern and resulting volume velocity waveform, while more recently, interactions between source and vocal tract have become of primary concern (Fant et al., 1985).

The first mechanical model of the vibrating vocal folds was a single mass-spring-damping system (Flanagan and Landgraf, 1968). Waveforms generated by this model bore many similarities to physiological data, but the conditions under which the system would vibrate were somewhat restricted. An important aspect of natural vibrations appears to be the out-of-phase motions of the upper and lower surface of the folds (Ishizaka and Matsudaira, 1968; Stevens, 1977; Broad, 1979), and the vertical component to the vibration pattern of the folds (Baer, 1981). The first-order aspects of these phenomena have been captured by two-mass models of each fold, in which the upper and lower surfaces of the folds are simulated by separate masses coupled by a spring (Ishizaka and Flanagan, 1972). The sound generation capabilities of such a model (coupled to a digital simulation of a transmission line analog of the vocal tract) were demonstrated by Flanagan et al. (1975) (example 12 of the Appendix).

Another approach to the modeling of the vocal fold vibration behavior has been to create a three-dimensional structure consisting of a large number of coupled masses (Titze, 1974; Allen and Strong, 1985). More complex vibration modes are seen in this type of model, and it may be possible to mimic certain pathologies. However, in all of the physiological models, no entirely satisfactory solution has been proposed for simulating what happens when the vocal folds slam together at the midline and deform in some way to absorb the energy of the impact. Until such phenomena are included, it is difficult to predict when the folds will open or to predict their initial opening velocity (Stevens, 1987).

The resonance structure of the vocal tract results in standing pressure waves that can have an effect on the pressure distribution at the glottis, and hence the vibration pattern and airflow waveform from the voicing source (Fant, 1982; Fant et al., 1985). Similarly, the opening and closing of the glottis provide a time-varying termination impedance that affects the formant frequencies and bandwidths of the vocal tract transfer function (Holmes, 1973; Fant and Ananthapadmanabha, 1982). While these effects are not large, they may be of some importance in simulating natural voice qualities by providing period-to-period variability to the glottal waveform for the first few periods at the onset of voicing, as well as causing pitch-synchronous changes to the first formant frequency and bandwidth over a pitch period. Liljencrants (1985) has programmed a detailed articulatory model to simulate these effects, with the result that the synthesis of a steady vowel sounds quite natural.
 

Go to Page | Contents A. Early synthesizers | Index | Bibl. | Page- | Page+

 KLATT 1987, p. 748 
SSSHP Contents | Labs
Smithsonian Speech Synthesis History Project
National Museum of American History | Archives Center
Smithsonian Institution | Privacy | Terms of Use