locus, rather than the constant F2 onset frequency or "hub" of Potter
et al. (1947) was proven by synthesis of CV syllables with both types
of transitions. Delattre et al. (1955) found that if the second formant
actually started at 1800 Hz in each case, rather than at values shown
in the figure, listeners heard /bi, da, gu/ instead of the intended
/di, da, du/. Only when the virtual loci were employed did subjects
hear /d/ in each case. The locus theory required postulation of two
loci for [g], one before front vowels (where [g] is really more
palatal in articulation) of 3000 Hz, and a much lower locus before back
vowels. Another important observation was that phonemes sharing
features such as those specifying place of articulation often shared
certain acoustic patterns, making it possible to state synthesis rules
efficiently in terms of familiar phonetic features,
Fig. 17. Based on
his experience with the Pattern Playback, Pierre Delattre became quite
good at drawing stylized patterns for arbitrary sentences (example 15
of the Appendix).
a. Vowels. The acoustic theory of vowel production (Chiba and Kajiyama,
1941; Fant, 1960; Stevens and House, 1961) showed that vowels can be
represented by an all-pole vocal tract transfer function, and that
the relative amplitudes of the formant peaks can be predicted from a
knowledge of formant frequencies, as long as the vowel is not
nasalized. Peterson and Barney (1952) collected systematic data on
formant frequencies and amplitudes from a wide sampling of men, women,
and children. From these and many other data collection, synthesis,
and perceptual validation efforts, we know that English vowels can
be described in terms of the frequencies of the lowest three formants,
any frequency motions associated with diphthongization (Holbrook and
Fairbanks, 1962), and differences in vowel duration. Formant bandwidths
also differ slightly among vowels (the best data for synthesis purposes
appear to be Stevens and House, 1963); attention to details such as
these is likely to lead to a slightly more natural voice quality.
b. Sonorant consonants. The non-nasal sonorant
consonants of English,
/w,y,r,l/, are similar to vowels, but are shorter in duration, somewhat
more extreme in articulation, and are said to involve more rapid
transitions into adjacent sounds than do vowels (O'Connor et al.,
1957; Lisker, 1957; Lehiste, 1962). Sample broadband spectrograms of
these consonants in intervocalic position are shown in the bottom row
of Fig. 18. Each consonant is
preceded and followed by the vowel /a/,
which has been truncated at the approximate midpoint of the vowel in
order to fit all English consonants onto one plot. In utterance-initial
position before a vowel, sonorant consonants consist of an initial
brief vowel-like steady state followed by continuous formant
trajectories into the following vowel. The /l/ is both sonorant and
stop-like in characteristics -- having a very rapid small rise in F1
and F2
|