and amplitude of the voicing source so as to mimic the fluctuations
seen in the sentence, Holmes spent a long time carefully adjusting
formant frequencies and amplitudes on a trial-and-error basis (see
Fig. 9). He found that much of the
detailed period-to-period
variability in the spectra of natural speech can be mimicked by
proper adjustments to the amplitudes of parallel formants -- even
though we may not as yet have a good enough theory and source model
to account for all of this natural variation. According to Holmes,
the observed irregularities in the spectrum between the formant
peaks are of little perceptual importance; only the strong harmonics
near a formant peak and below F1 must be synthesized with the
correct amplitudes in order to mimic an utterance with a high degree
of perceptual fidelity. Holmes also showed that phase relations among
harmonics of the voicing source are important for earphone listening,
but not when loudspeakers are used and the sound is modified by the
reverberation of ordinary room acoustics. The Holmes synthesizer has
recently been implemented on a real-time signal processing chip
(Quarmby and Holmes, 1984).
Translation of the Holmes voice imitating abilities into rules for
automatic synthesis of natural voice qualities has not, as yet, been
successfully achieved. His parallel synthesizer is clearly up to the
job, at least for male voices, so the problem remains one of developing
an appropriate theory of control. Of course, it may be that the right
theory will suggest a quite different model, such as an articulatory
synthesizer.
3. Models of the voicing source
The voicing sound source used in a formant synthesizer has evolved
from the simple sawtooth waveforms and filtered impulse train used
in early designs. An impulse train filtered by a two-pole low-pass
filter, displayed at the top in
Fig. 10, has about the right average
spectrum, but the phase of this waveform is wrong. Primary excitation
of the vocal tract filters occurs at a time corresponding to the
instant the folds open, rather than at closure. Furthermore, the
spectrum envelope is perfectly regular (i.e., monotonically
|