given by these systems to that of an idealized X-ray movie of the
vocal tract, from which not only dynamic changes in shape, but also
the contribution of each of the individual articulators to the changes
in shape are apparent.
4.4 Neuromotor Command Synthesis
But the description is still deficient in some respects. The
parameters which describe articulatory motion may seem the obvious
ones, and may be empirically successful, but they have no necessary
theoretical basis. The various articulators, of course, are not free
to move at random but only to and from a limited number of targets.
This limitation on the number of targets accounts for the limited
number of values which can be assumed even by features associated with
such a complex articulator as the tongue. If we could go a stage
further back in the speech chain, and synthesize speech at the level
of the neuromotor commands which control the muscles of the
articulators, we might be able to account for these significant
limitations. Fortunately, electromyographic techniques can help us
here (e.g. Harris et al. 1965; Fromkin 1966). From measurements of
the voltages picked up during speech by electrodes placed in the
vocal tract, it is possible to make some plausible inferences about
muscle activity -- and hence about the corresponding neuromotor
commands -- in the production of the sounds of speech.
The synthetic counterpart of electromyographic analysis would
describe speech in terms of a series of commands to the muscles of
the vocal tract. An approach to this kind of synthesis has been
made by Hiki, who has developed a description of jaw and lip movement
using muscle parameters (Hiki and Harshman 1969). The forward part
of the vocal tract is treated as an acoustic tube of varying length,
height and width. The value for each dimension depends on the positive
or negative force exerted by lip and jaw muscles, and each of these
muscles may affect other dimensions as well. Muscles of the lips
which affect the same dimensions in the same way are grouped together,
and the same is the case for muscles of the jaw. The force exerted by
such a group of muscles (actually the effect of several neuromotor
commands) is a parameter of the system. Four lip and two jaw parameters
are used. The forces acting separately on lip and jaw are combined
to produce a description of shape, and with this partial model,
labial sounds can be synthesized with a vocal tract analog. More
recently Hiki has extended his investigations to the tongue
(Hiki 1970).
Clearly, synthesis by rule must move in the direction suggested
by Hiki's work. Only with models of this sort, making use of the
earliest observable stage of the speech chain, will it be possible
to gain insight into the nature of individual gestures and their
relative timing. It is significant, however, that myographic synthesis,
as represented by Hiki's scheme, seems to lead to an increase rather
than a decrease in the number of parameters, as compared with
articulatory models, even though
|