NMAH | Smithsonian Speech Synthesis History Project (im

pointed out that synthesis systems may be classified both as articulatory or acoustic and as parametric or non-parametric. The shape models we have just been discussing are articulatory, but they are not based on any natural parameters comparable to formant frequencies, still less on any set of features. Instead, an arbitrary number of vocal tract cross-sections is used. This is a level of development corresponding to the point in acoustic phonetics when the most significant possible representation of the acoustic spectrum was in terms of a bank of filters. A further limitation of shape systems is that the transitional rules can be little more than arbitrary smoothing rules; it would be very difficult to characterize the changes in shape of the vocal tract differentially segment by segment. In fact, it is a question whether vocal tract shape as such is a significant stage in the speech chain, except in a strictly physical sense. What is needed is a set of parameters for vocal tract shape which would account for the behavior of the tract in the formation of the various sounds and at the same time facilitate a simple statement of rules.

Stevens and House (1955) have suggested a simple three-parameter model for vowel articulation in which the vocal tract is idealized as a tube of varying radius. Two of the parameters are d, the distance of the main constriction from the glottis, and , the radius of the tube at this constriction. The radius at another point along the tube depends on and the distance x from the constriction:

The front portion of the tract (14.5 cm. from the glottis and beyond), however, is characterized by a third parameter A/2, the ratio of the area of mouth opening to the length of this position of the tract. This ratio, inversely proportionate to acoustic impedance, varies depending on the protrusion of the lips. These parameters correspond, of course, to the familiar phonetic dimensions of front-back, open-close and rounded- unrounded, and serve to characterize vowels very well. With a static vocal tract analog, Stevens and House were able to use this model to synthesize vowels with the formant frequency ranges observed by Peterson and Barney (1952). These parameters are not, of course, satisfactory for most consonants, if only because the formula for computing would break down under the circumstances of fricative narrowing and stop closure. Ichikawa et al. (1967) propose another, more general scheme for which the parameters are the maximal constriction point P and the maximum area points V1 and V2 of the front and back cavities formed by this constriction. Ichikawa and Nakata (1968) report that they have used this very over-simplified model in a synthesis-by-rule system. It does not seem likely, however, that any parametric description of vocal tract shape will prove satisfactory unless it directly reflects the behavior of the various articulators in some detail. As Ladefoged (1964:208) has observed, 'describing articulations in terms of the highest point of the tongue or the point of maximum constriction of the vocal tract is rather like describing different ways of walking in terms of movements of the big toe or ankle'. But this is as much as to say that it is necessary

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use