NMAH | Smithsonian Speech Synthesis History Project (im

several muscles exerting parallel forces are grouped under one parameter. Though the neuromotor commands for e.g. lip closure are similar for the different manner classes of labial sounds (Harris et al. 1965), the relationship between this gesture and the neuromotor commands which produce it is not a simple one. This suggests that the connection between the phonetic feature corresponding to lip closure and the neuromotor commands may not be simple, either; perhaps the realization of some value of a phonetic feature as a unitary psychological gesture may actually involve a complex neuromotor program. This view is reinforced by the recent finding of MacNeilage and DeClerk (1969) that coarticulation appears even in electromyographic data.

4.5 Synthesis of Excitational and Prosodic Features

Our discussion so far has been concerned with the synthesis of segmental phones and with supraglottal articulation and its acoustic consequences. A synthesis by rule scheme also has to take into account excitational, prosodic and demarcative features, the associated glottal and subglottal events, and the acoustic correlates of these events.

Both resonance and vocal tract analog synthesizers provide periodic and noisy excitation sources, periodic excitation being used for vowels, sonorants and voiced stops; noisy excitation for [h], aspiration, and frication. In resonance synthesizers, separate circuits (either fixed filters or variable frequency resonators) are ordinarily provided for shaping high-frequency frication; in vocal tract analog synthesizers, noise is inserted at various segments in the tract, depending on the place of articulation of the fricative. With such facilities the different kinds of excitation are readily simulated; the only problem is to write rules for the changes from one excitation source to another. This aspect of synthesis by rule has not been taken very seriously; usually the duration of the excitation appropriate for a phone is identical with the nominal duration of the phone itself. In the case of voiceless stops, however, this approach requires including part of the transition to the following vowel in the stop, as was done by Holmes et al. (1964). Another solution is to specify, as a characteristic of the voiceless consonant, the appropriate amount of devoicing of the following phone, as we have done (Mattingly 1968a). What is really required, however, is a rule specifying voice-onset time negatively or positively relative to the instant of release, as the work of Lisker and Abramson (1967) suggests. For medial and final voiced consonant and consonant clusters, increased duration of the preceding vowel is well known to be an important cue (Kenyon 1950:63; Denes 1955) and some systems have taken account of it, e.g. Mattingly (1968a), Rabiner (1969).

Rather more attention has been given to prosodic and demarcative features such as stress, accent, intonation, juncture and pause, which interact with inherent properties

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use