NMAH | Smithsonian Speech Synthesis History Project (im

for stylistic variation, e.g. 'dip', for the downward pitch prominence noted by Bolinger (1958).

While only a few terminal intonation contours carry grammatical information and are required for synthesis of ordinary discourse, many more occur in colloquial speech. Iles (1967), following a scheme of tones (i.e. terminal contours) proposed by Halliday (1963), has attempted to synthesize some of these tones, imposing the contours on segmental synthetic speech generated by PAT in the manner of Holmes et al. (1964).

The models of the behavior of Fo discussed so far assume a basic contour for the whole breath group, on which stress and terminal intonation contours are imposed -- an approach consistent both with the work of British students of intonation from Armstrong and Ward (1931) to O'Connor and Arnold (1961), and with the 'archetypal' model of intonation proposed by Lieberman (1967). Another possible approach is to characterize a breath group as a series of pitch-levels, as in Pike's (1945) well-known scheme. Shoup (pers. comm.) has synthesized sentences in which the Fo contours corresponding to such descriptions are realized, taking into account the stress, the vowel quality, and the excitation of the preceding consonant and the frequency at the beginning of the syllable.

Synthesis by rule of prosodic features has come to receive serious attention only quite recently, by comparison with synthesis of segmental features. We have only just begun to understand what is easy, what is difficult; what is relevant and what is irrelevant. Of the three major correlates of the prosodic features, intensity has proved the least sensitive and the least important. Fo has attracted the most interest: considerable success has been attained in producing convincing stress and terminal intonation contours by rule, and the articulatory mechanism has been simulated. Duration, however, remains a serious problem. No one has yet produced even an empirically successful set of duration rules, and it is far from clear what theoretically adequate rules would be like. Presumably there are some durational effects which are really automatic consequences of the articulation: formant transition durations surely fall into this category. A second group of effects are truly temporal, but subject to phonological rule: vowel length, for example. Finally, there are effects which are, to some extent, under the conscious control of the speaker: speaking rate, for instance. All these different effects are superimposed in actual speech: sorting them out is a major task for synthesis by rule.

In the prosodic schemes we have just been describing, even those which model supraglottal shape or articulatory movement, the prosodic features are still being simulated purely acoustically. No attempt is made to model explicitly the articulatory mechanisms which are responsible for the variation in the acoustic correlates. Unfortunately, the prosodic articulatory mechanisms are much less well understood than those which underlie segmental features, which explains in part the lack of unanimity concerning the appropriate treatment of prosodic features at the phonological level.

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use