decreasing at about 12 dB per octave), which contrasts with evidence
indicating the presence of zeros in the spectra of normal voicing
waveforms (Flanagan, 1958; Miller, 1959; Mathews et al.,
1961; Monsen
and Engebretson, 1977; Fant, 1979; Sundberg and Gauffin, 1979;
Ananthapadmanabha, 1984).
Perceptual data (Rosenberg, 1971) and theoretical considerations
(Titze and Talkin, 1979) suggest ways in which the simulation of
the glottal waveform might be improved. For example, Rothenberg
et al. (1975) constructed a three-parameter model of the voicing
waveform that can produce a family of more natural waveshapes varying
with respect to fundamental frequency, amplitude, open quotient
(ratio of open time to total period), degree of static glottal
opening, and breathiness. Some of these degrees of freedom are
illustrated in
Fig. 10. The model is used in the Infovox SA-101
text-to-speech system (Magnusson et al., 1984).
More recently, Fant et al. (1985) have proposed a mathematical model
having similar capabilities, but with more direct control over the
important acoustic variables. Some of the flexibility is illustrated
in the spectral domain in
Fig. 11. General spectral tilt, locations
of spectral zeros, and intensity of the fundamental component are
under user control. The Klattalk voicing source waveform defined in
the top half of
Fig. 12, which is quite similar to the Fant model,
can be modified in (1) open period, (2) abruptness of the closing
component of the waveform, (3) breathiness, and (4) degree of
diplophonic vibration (alternate periods more similar than adjacent
periods). However, rules for dynamic control of these variables are
quite primitive. The limited naturalness of synthetic speech from
this and all other similar
|