NMAH | Smithsonian Speech Synthesis History Project (dk

decreasing at about 12 dB per octave), which contrasts with evidence indicating the presence of zeros in the spectra of normal voicing waveforms (Flanagan, 1958; Miller, 1959; Mathews et al., 1961; Monsen and Engebretson, 1977; Fant, 1979; Sundberg and Gauffin, 1979; Ananthapadmanabha, 1984).

Perceptual data (Rosenberg, 1971) and theoretical considerations (Titze and Talkin, 1979) suggest ways in which the simulation of the glottal waveform might be improved. For example, Rothenberg et al. (1975) constructed a three-parameter model of the voicing waveform that can produce a family of more natural waveshapes varying with respect to fundamental frequency, amplitude, open quotient (ratio of open time to total period), degree of static glottal opening, and breathiness. Some of these degrees of freedom are illustrated in Fig. 10. The model is used in the Infovox SA-101 text-to-speech system (Magnusson et al., 1984).

More recently, Fant et al. (1985) have proposed a mathematical model having similar capabilities, but with more direct control over the important acoustic variables. Some of the flexibility is illustrated in the spectral domain in Fig. 11. General spectral tilt, locations of spectral zeros, and intensity of the fundamental component are under user control. The Klattalk voicing source waveform defined in the top half of Fig. 12, which is quite similar to the Fant model, can be modified in (1) open period, (2) abruptness of the closing component of the waveform, (3) breathiness, and (4) degree of diplophonic vibration (alternate periods more similar than adjacent periods). However, rules for dynamic control of these variables are quite primitive. The limited naturalness of synthetic speech from this and all other similar

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use