NMAH | Smithsonian Speech Synthesis History Project (dk

locus, rather than the constant F2 onset frequency or "hub" of Potter et al. (1947) was proven by synthesis of CV syllables with both types of transitions. Delattre et al. (1955) found that if the second formant actually started at 1800 Hz in each case, rather than at values shown in the figure, listeners heard /bi, da, gu/ instead of the intended /di, da, du/. Only when the virtual loci were employed did subjects hear /d/ in each case. The locus theory required postulation of two loci for [g], one before front vowels (where [g] is really more palatal in articulation) of 3000 Hz, and a much lower locus before back vowels. Another important observation was that phonemes sharing features such as those specifying place of articulation often shared certain acoustic patterns, making it possible to state synthesis rules efficiently in terms of familiar phonetic features, Fig. 17. Based on his experience with the Pattern Playback, Pierre Delattre became quite good at drawing stylized patterns for arbitrary sentences (example 15 of the Appendix).

a. Vowels. The acoustic theory of vowel production (Chiba and Kajiyama, 1941; Fant, 1960; Stevens and House, 1961) showed that vowels can be represented by an all-pole vocal tract transfer function, and that the relative amplitudes of the formant peaks can be predicted from a knowledge of formant frequencies, as long as the vowel is not nasalized. Peterson and Barney (1952) collected systematic data on formant frequencies and amplitudes from a wide sampling of men, women, and children. From these and many other data collection, synthesis, and perceptual validation efforts, we know that English vowels can be described in terms of the frequencies of the lowest three formants, any frequency motions associated with diphthongization (Holbrook and Fairbanks, 1962), and differences in vowel duration. Formant bandwidths also differ slightly among vowels (the best data for synthesis purposes appear to be Stevens and House, 1963); attention to details such as these is likely to lead to a slightly more natural voice quality.

b. Sonorant consonants. The non-nasal sonorant consonants of English, /w,y,r,l/, are similar to vowels, but are shorter in duration, somewhat more extreme in articulation, and are said to involve more rapid transitions into adjacent sounds than do vowels (O'Connor et al., 1957; Lisker, 1957; Lehiste, 1962). Sample broadband spectrograms of these consonants in intervocalic position are shown in the bottom row of Fig. 18. Each consonant is preceded and followed by the vowel /a/, which has been truncated at the approximate midpoint of the vowel in order to fit all English consonants onto one plot. In utterance-initial position before a vowel, sonorant consonants consist of an initial brief vowel-like steady state followed by continuous formant trajectories into the following vowel. The /l/ is both sonorant and stop-like in characteristics -- having a very rapid small rise in F1 and F2

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use