KLATT 1987, p. 764 |
replaced by an allophone with distinctly different articulatory/ acoustic properties. For example, the phoneme / l / is realized as a velarized variant following a vowel, while there is normally no velar constriction for word-initial productions of / l / (Lehiste, 1962). Less clear are those cases where a small change is the result of a low-level articulatory interaction (Schwartz, 1967), or where many small changes can be made along an articulatory/ acoustic dimension such as voice onset time. For example, the time between release of a / t / and voicing onset is typically about 50 ms, but is systematically about 10 ms longer in a word-initial position, e.g., "tone," than it is in prestressed word-medial positions, e.g., "atone," and VOT is shorter if the following vowel is unstressed (Klatt, 1975b). Should one create a separate symbol for each gradation along the voice-onset-time continuum, or handle these effects as low-level adjustments to the time functions that control the synthesizer? Distinctions between allophone selection rules and parameter adjustment rules are necessarily arbitrary, and of relatively little theoretical import to us. 8 The important thing is to be able to produce the appropriate acoustic changes in the synthetic speech, and to do so in an efficient way. Some of the rules to be discussed below appear to be articulatory simplifications that allow the speaker to be "lazy" in realizing some unstressed phonetic sequences. While ease of pronunciation may play a role in the development of allophonic variation, a far more important function of these rules is to help mark boundaries, especially word boundaries, in the flow of speech (Lehiste, 1959; Nakatani and Dukes, 1977); Lehiste cites many examples where allophones mark boundaries, the best known of which is the distinction between "night rate" and "nitrate," where a listener can easily tell which sequence was intended by the speaker because of stronger frication/ aspiration in the latter case. 9 Most of the rules discussed below are thus not strictly "sloppy speech" rules, and they are not optional rules. They are needed to make sentences sound fluent and natural. The rules help the listener decide the syllable affiliation of consonants and the degree of stress on a syllable, and thus indirectly constrain locations of potential word boundaries, permitting the listener to parse an utterance into words without pursuing too many alternative interpretations (Church, 1983). Phonotactics, or the specification of permitted phonetic sequences at the beginnings, middles, and ends of words, also can provide word boundary hypotheses for the listener (Lamel and Zue, 1984). The details of phonological rule application differ for the different dialects of English, as well as for different speaking styles (formal/casual) and speaking rates within a given dialect. This is a serious problem for speech recognition devices (see, e.g., the rule compendium of Cohen and Mercer, 1974), but a text-to-speech system need only select rules appropriate for one acceptable dialect of English, and perhaps make some modifications concerning rule applicability as a function of speaking rate (Bernstein and Baldwin, 1985). In Klattalk, some phonetic simplifications across word boundaries are blocked if a phrase boundary is present. This mechanism is used to produce more formal speech at slow speaking rates simply by placing phrase boundary symbols at more minor phrase breaks when analyzing a text. In the future, it might be interesting to attempt to simulate additional dialects and styles by direct manipulation of phonological rules in these systems. Some of the allophonic phenomena to be described have been known for a long time, many having appeared in phonetics textbooks at least as far back as the 1930's (Bloomfield, 1933; Hocket, 1955; Heffner, 1969). However, acoustic characterization had to await instrumental study. One of the first and best of the acoustic-phonetic studies was performed by Lehiste (1959, 1964). She noted the following kinds of word boundary indicators:
In a subsequent study of [ l ] and [ r ], Lehiste (1962) noted that the prevocalic "light" allophone of / l / as in "lead" has a second formant that depends on the following vowel, postvocalic "dark" or velarized / l / as in "deal" has a lower second formant that is independent of the preceding vowel, and is similar to the syllabic / l / in "bottle." The initial allophone of / r / as in "reed" has lower F1, F2, F3 than the postvocalic allophone as in "deer." The syllabic nucleus as in "dirt" has formant targets similar to the postvocalic allophone. In a study of the allophones of / t,d / and their distribution, Zue and Laferriere (1979) distinguished:
|
KLATT 1987, p. 764 |
SSSHP Contents | Labs | |
Smithsonian Speech Synthesis History Project | |
National Museum of American History | Archives Center | |
Smithsonian Institution | Privacy | Terms of Use |