SSSHP Contents | Labs

 KLATT 1987, p. 754 
Go to Page | Contents C. Segmental synthesis | Index | Bibl. | Page- | Page+
 

straight-line transitions á la Fig. 19, he achieved a consonantal intelligibility of 95% for CVC nonsense syllables played to trained phoneticians. Klatt had greatest difficulty with stop consonants. He, along with many others (Fant, 1973; Kewley-Port, 1982) found that the locus theory was an over-simplification that applied, at best, to two-formant acoustic patterns. Based on extensive data from a single speaker (examples are shown in Fig. 20) he tried to determine whether a modified locus concept could be created, or whether a list was needed to tabulate the starting frequencies for F1, F2, and F3 before each vowel. A locus theory is manifested in Fig. 20 when all of the data points lie on a straight line, i.e., when one can predict the onset frequency F2_onset from the vowel target frequency F2_vowel by an equation of the form:

F2_onset = F2_locus + k * [F2_vowel - F2_locus],   (1)

where the locus frequency F_locus and degree of vowel coarticulation at the instant of release k are parameters to be fit to the observed data from each consonant. 4  At first it seemed there was little hope for resurrecting the locus concept because, as noted by Fant (1973), many complex factors cause the locus idea to fail. A transition can have both a rapid and a slow component, due to rapid release of the obstruction followed by gradual tongue body movements; a preceding vowel can influence the observed F2 onset of a CV transition (Öhman, 1966); and F2 can be relatively insensitive to oral constrictions when it is essentially a back cavity resonance, as in the vowel [i]. Klatt hypothesized that the primary influences of the vowel on consonantal articulation were fronting/ backing of the tongue body and lip rounding. He therefore divided the set of English vowels into { + FRONT }, { + ROUND }, and the remainder which were { - FRONT, - ROUND }, and found that within each set, the data were sufficiently regular to be approximated by straight lines, as in Fig. 20 (Klatt, 1979b). While some data points lie slightly off the straight lines and might be better synthesized by a table look-up strategy, the recognition score of 95% correct obtained for synthetic plosives in CV nonsense syllables (Klatt, 1970) is encouraging.

Examples of burst spectra obtained from one talker, Fig. 21, support the Klatt strategy of dividing the data into vowel subsets by showing remarkably constant spectral shape and amplitude for the burst before all vowels in a given vowel set, but substantial differences across vowel sets [recall also the Cooper et al. (1952) perceptual results]. Burst spectra were synthesized by a strategy of selecting from one of three possible synthetic bursts depending on the following vowel. It was also noted that the properties of the burst spectra conformed to theoretical predictions concerning the quantal nature of place of articulation (Stevens, 1972), so that only formants corresponding to resonances of the cavity in front of the constriction were strongly excited by noise. For example, in [k] and [g] bursts, the noise excited F2 and F4 before back vowels, and F3 and F5 before front vowels.

One question that concerned early researchers was whether there might exist a stylized version of synthetic "super speech" that was more intelligible than natural speech because, e.g., the formant peaks were enhanced or burst spectra were "cleaned up" so as to contain only one major energy concentration, or formant transitions were more extreme than normally observed. Such efforts have always failed; synthesis that is a better match to observed natural data has always sounded better and has been measurably more intelligible. Every potential cue (acoustic regularity associated with a phonetic gesture) that has been examined has been shown to have some perceptual cue value (Liberman
 

Go to Page | Contents C. Segmental synthesis | Index | Bibl. | Page- | Page+

 KLATT 1987, p. 754 
SSSHP Contents | Labs
Smithsonian Speech Synthesis History Project
National Museum of American History | Archives Center
Smithsonian Institution | Privacy | Terms of Use