for stylistic variation, e.g. 'dip', for the downward pitch prominence
noted by Bolinger (1958).
While only a few terminal intonation contours carry grammatical
information and are required for synthesis of ordinary discourse,
many more occur in colloquial speech. Iles (1967), following a scheme
of tones (i.e. terminal contours) proposed by Halliday (1963), has
attempted to synthesize some of these tones, imposing the contours
on segmental synthetic speech generated by PAT in the manner of
Holmes et al. (1964).
The models of the behavior of Fo discussed so far assume a basic
contour for the whole breath group, on which stress and terminal
intonation contours are imposed -- an approach consistent both with
the work of British students of intonation from Armstrong and Ward
(1931) to O'Connor and Arnold (1961), and with the 'archetypal' model
of intonation proposed by Lieberman (1967). Another possible approach
is to characterize a breath group as a series of pitch-levels, as in
Pike's (1945) well-known scheme. Shoup (pers. comm.) has synthesized
sentences in which the Fo contours corresponding to such descriptions
are realized, taking into account the stress, the vowel quality, and
the excitation of the preceding consonant and the frequency at the
beginning of the syllable.
Synthesis by rule of prosodic features has come to receive serious
attention only quite recently, by comparison with synthesis of
segmental features. We have only just begun to understand what is
easy, what is difficult; what is relevant and what is irrelevant.
Of the three major correlates of the prosodic features, intensity has
proved the least sensitive and the least important. Fo has attracted
the most interest: considerable success has been attained in producing
convincing stress and terminal intonation contours by rule, and the
articulatory mechanism has been simulated. Duration, however, remains
a serious problem. No one has yet produced even an empirically
successful set of duration rules, and it is far from clear what
theoretically adequate rules would be like. Presumably there are
some durational effects which are really automatic consequences of
the articulation: formant transition durations surely fall into this
category. A second group of effects are truly temporal, but subject
to phonological rule: vowel length, for example. Finally, there are
effects which are, to some extent, under the conscious control of
the speaker: speaking rate, for instance. All these different effects
are superimposed in actual speech: sorting them out is a major task
for synthesis by rule.
In the prosodic schemes we have just been describing, even those
which model supraglottal shape or articulatory movement, the prosodic
features are still being simulated purely acoustically. No attempt is
made to model explicitly the articulatory mechanisms which are
responsible for the variation in the acoustic correlates. Unfortunately,
the prosodic articulatory mechanisms are much less well understood
than those which underlie segmental features, which explains in part
the lack of unanimity concerning the appropriate treatment of prosodic
features at the phonological level.
|