NMAH | Smithsonian Speech Synthesis History Project (im

to go back to the next earlier stage of the speech chain, the stage of articulatory gesture.

4.3 Articulator Systems

A number of investigators are attempting to write rules for synthesis in terms of the movements of the individual articulators. The basic approach is to assume a model for the motion of each articulator which is convenient for the statement of the rules. From the states of the articulator models the vocal tract shape, and in turn the acoustic signal, can be determined for a given excitation.

Coker (1967, pers. comm.) uses a modified version of a model suggested by Coker and Fujimura (1966). Two parameters for the lips, one for the velum and four for the tongue determine the shape of the oral vocal tract. The lip parameters indicate the degree of protrusion and of closure; the parameter for the velum indicates its relative elevation; two of the tongue parameters indicate the degree of apical closure and front-back position for the tongue tip; and the other two, the position of the central mass of the tongue in the midsagittal plane. For each phone, target values for these parameters are stored. The stored values are divided into 'important' and 'unimportant'; thus, degree of rounding is unimportant for most sounds but important for [i] and [w] at one extreme and [y] at the other. Interpolation from target to target is accomplished by a 'low pass filter' rule, which produces a certain amount of coarticulation and vowel reduction. The different parameters move at different speeds -- for example, the apical parameter is quite fast and the protrusion parameter quite slow. The degree of coarticulation is greater for slowly-moving parameters than for fast ones. Parameter speed is increased in transitions from unimportant to important values and reduced in transitions from important to unimportant values, thus increasing coarticulation for those parameters which specially characterize a particular phone. Parameter timing can also be modified depending on context; this feature of the system is used to provide anticipatory rounding. Target values for each phone are changed simultaneously, except that in a consonant cluster, parameters for different articulators overlap. For each momentary set of articulatory parameter values, the corresponding vocal tract shape is determined, and from the shape, the formant frequencies, which are used to control a resonance synthesizer.

Haggard (Werner and Haggard 1969) has developed a similar model with 11 parameters. Like Coker, he has parameters for lip protrusion and lip closure, for elevation of the velum, and for tongue-tip position and closure. Position, degree of closure, and length of closure are parameters for the body of the tongue, and degree of closure for jaw and glottis. From a momentary description in terms of articulatory parameters, 'construction' (i.e. shape) parameters are derived which describe the vocal tract as a sequence of a few tubes of varying length and cross-sectional area. A nomogram of the sort given by Fant (1960:65) is used to calculate

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use