NMAH | Smithsonian Speech Synthesis History Project (dk

kinds of rules is incorporated explicitly in the Klatt system, but partial isochrony is achieved through rules that shorten unstressed syllables and consonant clusters (Carlson et al., 1979). The Klatt rules capture durational differences between nouns and verbs by phrase-final lengthening and destressing of common verbs. An emphasis symbol is provided to capture word frequency and discourse expectancy effects in a binary fashion. These alternative mechanisms for mimicking observed tendencies in durational data make it nearly impossible to determine which rule system has a basis most similar to psychological processes.

3. Fundamental frequency rules

Many phenomenological observations have been collected about pitch motions in English sentences, and hypotheses have been generated concerning their relations to linguistic constructs known as intonation and stress. The intonation pattern is defined to be the pitch pattern over time that, for example, distinguishes statement from question or imperative, and that marks the continuation rise between clauses for an utterance of more than one clause. The stress pattern on syllables can distinguish words such as " 'insert" from "ins'ert" even though the two words have identical segmental phonemes. Linguists originally believed that there was a fairly direct correspondence between intonation and pitch, while levels of stress were manifested by changes in vocal intensity and syllable duration. Now we know that fo changes affect stress judgements significantly (Fry, 1958; Nakatani and Schafer, 1978), and that a rise in fo or a fall in fo can indicate a stressed syllable. The fo pattern plays a complex role in encoding information for the listener because it not only conveys information about syntactic structure and stress patterns, but it also helps indicate speaker gender, head size, psychological state, and attitude toward what is being spoken. This section reviews briefly some of what is known about this encoding.

Pike (1945) believed that English is like a tone language in that four different degrees of stress corresponded to different pitch levels. However, it has been shown that a given stress level is manifested as a higher pitch at the beginning of a sentence than near the end (Lieberman, 1967), so absolute fo cannot be the relevant cue to the level of a tone. Lieberman also demonstrated that (simulated) emotional states changed fo patterns in ways that made it impossible for linguists to assign stress levels to syllables in a consistent way when listening to read sentences. Thus emotions and attitudes are also conveyed to some extent by fo patterns (for sample data, see Uldall, 1960; O'Shaughnessy and Allen, 1983). Instrumental analyses also indicated that segmental identity could perturb the fo value (House and Fairbanks, 1953), and that there were large differences across speakers depending primarily on larynx size. On average, female speakers use fo values about 1.7 times male values (Peterson and Barney, 1952), plus perhaps a slightly more lively set of dynamic changes (higher peaks and lower troughs) than simple scaling would imply.

Bolinger (1972) notes the frequent use of contrastive stress or emphasis in expressive reading. To the extent that locations for emphasis can be determined for text, the emphasis can be manifested acoustically by increasing the duration of the emphasized word, increasing the pitch rise that ordinarily accompanies its primary-stressed syllable, and decreasing the size of all other pitch rises in the remainder of the sentence (Cooper and Sorenson, 1981).

O'Shaughnessy (1979) and O'Shaughnessy and Allen (1983) examined fo contours for syntactically complex sentences, and for sentences involving modals. They observed that modal auxiliaries, negatives, quantifiers, and sentential adverbs tend to be emphasized (local fo increase) when present in read sentences. The authors interpret these results in terms of the speaker's feeling toward the proposition tending to dominate over the actual content of the proposition (Halliday, 1970).

The strength of an fo gesture depends on semantic factors that extend over more than one sentence (Coker et al., 1973). A repeated word is reduced in fo gesture, and the reduction is due to semantic recurrence rather than to reappearance of exactly the same item (Vanderslice, 1968). In

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use