kinds of rules is incorporated explicitly in the Klatt system, but
partial isochrony is achieved through rules that shorten unstressed
syllables and consonant clusters (Carlson et al., 1979). The Klatt
rules capture durational differences between nouns and verbs by
phrase-final lengthening and destressing of common verbs. An emphasis
symbol is provided to capture word frequency and discourse expectancy
effects in a binary fashion. These alternative mechanisms for
mimicking observed tendencies in durational data make it nearly
impossible to determine which rule system has a basis most similar
to psychological processes.
3. Fundamental frequency rules
Many phenomenological observations have been collected about pitch
motions in English sentences, and hypotheses have been generated
concerning their relations to linguistic constructs known as
intonation and stress. The intonation pattern is defined to be the
pitch pattern over time that, for example, distinguishes statement
from question or
imperative, and that marks the continuation rise between clauses
for an utterance of more than one clause. The stress pattern on
syllables can distinguish words such as " 'insert" from "ins'ert"
even though the two words have identical segmental phonemes.
Linguists originally believed that there was a fairly direct
correspondence between intonation and pitch, while levels of stress
were manifested by changes in vocal intensity and syllable duration.
Now we know that fo changes affect stress judgements significantly
(Fry, 1958; Nakatani and Schafer, 1978), and that a rise in fo or a
fall in fo can indicate a stressed syllable. The fo pattern
plays a complex role in encoding
information for the listener because it not
only conveys information about syntactic structure and stress
patterns, but it also helps indicate speaker gender, head size,
psychological state, and attitude toward what is being spoken. This
section reviews briefly some of what is known about this encoding.
Pike (1945) believed that English is like a tone language in that
four different degrees of stress corresponded to different pitch
levels. However, it has been shown that a given stress level is
manifested as a higher pitch at the beginning of a sentence than
near the end (Lieberman, 1967), so absolute fo cannot be the relevant
cue to the level of a tone. Lieberman also demonstrated that
(simulated) emotional states changed fo patterns in ways that made
it impossible for linguists to assign stress levels to syllables in
a consistent way when listening to read sentences. Thus emotions
and attitudes are also conveyed to some extent by fo patterns (for
sample data, see Uldall, 1960; O'Shaughnessy and Allen, 1983).
Instrumental analyses also indicated that segmental identity could
perturb the fo value (House and Fairbanks, 1953), and that there
were large differences across speakers depending primarily on
larynx size. On average, female speakers use fo values about 1.7
times male values (Peterson and Barney, 1952), plus perhaps a
slightly more lively set of dynamic changes (higher peaks and lower
troughs) than simple scaling would imply.
Bolinger (1972) notes the frequent use of contrastive stress or
emphasis in expressive reading. To the extent that locations for
emphasis can be determined for text, the emphasis can be manifested
acoustically by increasing the duration of the emphasized word,
increasing the pitch rise that ordinarily accompanies its
primary-stressed syllable, and decreasing the size of all other
pitch rises in the remainder of the sentence (Cooper and Sorenson, 1981).
O'Shaughnessy (1979) and O'Shaughnessy and Allen (1983) examined fo
contours for syntactically complex sentences, and for sentences
involving modals. They observed that modal auxiliaries, negatives,
quantifiers, and sentential adverbs tend to be emphasized (local fo
increase) when present in read sentences. The authors interpret these
results in terms of the speaker's feeling toward the proposition
tending to dominate over the actual content of the proposition
(Halliday, 1970).
The strength of an fo gesture depends on semantic factors that extend
over more than one sentence (Coker et al., 1973). A repeated word is
reduced in fo gesture, and the reduction is due to semantic recurrence
rather than to reappearance of exactly the same item (Vanderslice,
1968). In
|