to go back to the next earlier stage of the speech chain, the stage
of articulatory gesture.
4.3 Articulator Systems
A number of investigators are attempting to write rules for synthesis
in terms of the movements of the individual articulators. The basic
approach is to assume a model for the motion of each articulator
which is convenient for the statement of the rules. From the states
of the articulator models the vocal tract shape, and in turn the
acoustic signal, can be determined for a given excitation.
Coker (1967, pers. comm.) uses a modified version of a model suggested
by Coker and Fujimura (1966). Two parameters for the lips, one for the
velum and four for the tongue determine the shape of the oral vocal
tract. The lip parameters indicate the degree of protrusion and of
closure; the parameter for the velum indicates its relative elevation;
two of the tongue parameters indicate the degree of apical closure and
front-back position for the tongue tip; and the other two, the position
of the central mass of the tongue in the midsagittal plane. For each
phone, target values for these parameters are stored. The stored values
are divided into 'important' and 'unimportant'; thus, degree of
rounding is unimportant for most sounds but important for [i] and [w]
at one extreme and [y] at the other. Interpolation from target to
target is accomplished by a 'low pass filter' rule, which produces a
certain amount of coarticulation and vowel reduction. The different
parameters move at different speeds -- for example, the apical parameter
is quite fast and the protrusion parameter quite slow. The degree of
coarticulation is greater for slowly-moving parameters than for fast
ones. Parameter speed is increased in transitions from unimportant to
important values and reduced in transitions from important to
unimportant values, thus increasing coarticulation for those parameters
which specially characterize a particular phone. Parameter timing can
also be modified depending on context; this feature of the system is
used to provide anticipatory rounding. Target values for each phone
are changed simultaneously, except that in a consonant cluster,
parameters for different articulators overlap. For each momentary
set of articulatory parameter values, the corresponding vocal tract
shape is determined, and from the shape, the formant frequencies,
which are used to control a resonance synthesizer.
Haggard (Werner and Haggard 1969) has developed a similar model
with 11 parameters. Like Coker, he has parameters for lip protrusion
and lip closure, for elevation of the velum, and for tongue-tip
position and closure. Position, degree of closure, and length of
closure are parameters for the body of the tongue, and degree of
closure for jaw and glottis. From a momentary description in terms
of articulatory parameters, 'construction' (i.e. shape) parameters
are derived which describe the vocal tract as a sequence of a few
tubes of varying length and cross-sectional area. A nomogram of the
sort given by Fant (1960:65) is used to calculate
|