NMAH | Smithsonian Speech Synthesis History Project (im

given by these systems to that of an idealized X-ray movie of the vocal tract, from which not only dynamic changes in shape, but also the contribution of each of the individual articulators to the changes in shape are apparent.

4.4 Neuromotor Command Synthesis

But the description is still deficient in some respects. The parameters which describe articulatory motion may seem the obvious ones, and may be empirically successful, but they have no necessary theoretical basis. The various articulators, of course, are not free to move at random but only to and from a limited number of targets. This limitation on the number of targets accounts for the limited number of values which can be assumed even by features associated with such a complex articulator as the tongue. If we could go a stage further back in the speech chain, and synthesize speech at the level of the neuromotor commands which control the muscles of the articulators, we might be able to account for these significant limitations. Fortunately, electromyographic techniques can help us here (e.g. Harris et al. 1965; Fromkin 1966). From measurements of the voltages picked up during speech by electrodes placed in the vocal tract, it is possible to make some plausible inferences about muscle activity -- and hence about the corresponding neuromotor commands -- in the production of the sounds of speech.

The synthetic counterpart of electromyographic analysis would describe speech in terms of a series of commands to the muscles of the vocal tract. An approach to this kind of synthesis has been made by Hiki, who has developed a description of jaw and lip movement using muscle parameters (Hiki and Harshman 1969). The forward part of the vocal tract is treated as an acoustic tube of varying length, height and width. The value for each dimension depends on the positive or negative force exerted by lip and jaw muscles, and each of these muscles may affect other dimensions as well. Muscles of the lips which affect the same dimensions in the same way are grouped together, and the same is the case for muscles of the jaw. The force exerted by such a group of muscles (actually the effect of several neuromotor commands) is a parameter of the system. Four lip and two jaw parameters are used. The forces acting separately on lip and jaw are combined to produce a description of shape, and with this partial model, labial sounds can be synthesized with a vocal tract analog. More recently Hiki has extended his investigations to the tongue (Hiki 1970).

Clearly, synthesis by rule must move in the direction suggested by Hiki's work. Only with models of this sort, making use of the earliest observable stage of the speech chain, will it be possible to gain insight into the nature of individual gestures and their relative timing. It is significant, however, that myographic synthesis, as represented by Hiki's scheme, seems to lead to an increase rather than a decrease in the number of parameters, as compared with articulatory models, even though

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use