pointed out that synthesis systems may be classified both as
articulatory or acoustic and as parametric or non-parametric.
The shape models we have just been discussing are articulatory,
but they are not based on any natural parameters comparable to
formant frequencies, still less on any set of features. Instead,
an arbitrary number of vocal tract cross-sections is used. This
is a level of development corresponding to the point in acoustic
phonetics when the most significant possible representation of
the acoustic spectrum was in terms of a bank of filters. A further
limitation of shape systems is that the transitional rules can be
little more than arbitrary smoothing rules; it would be very
difficult to characterize the changes in shape of the vocal tract
differentially segment by segment. In fact, it is a question
whether vocal tract shape as such is a significant stage in the
speech chain, except in a strictly physical sense. What is needed
is a set of parameters for vocal tract shape which would account
for the behavior of the tract in the formation of the various sounds
and at the same time facilitate a simple statement of rules.
Stevens and House (1955) have suggested a simple three-parameter
model for vowel articulation in which the vocal tract is idealized
as a tube of varying radius. Two of the parameters are d, the
distance of the main constriction from the glottis, and
,
the radius of the tube at this constriction. The radius
at another point along the tube depends on
and the distance x from the constriction:
The front portion of the tract (14.5 cm. from the glottis and beyond),
however, is characterized by a third parameter A/2, the ratio of the
area of mouth opening to the length of this position of the tract. This
ratio, inversely proportionate to acoustic impedance, varies depending
on the protrusion of the lips. These parameters correspond, of course,
to the familiar phonetic dimensions of front-back, open-close and
rounded- unrounded, and serve to characterize vowels very well. With
a static vocal tract analog, Stevens and House were able to use this
model to synthesize vowels with the formant frequency ranges observed
by Peterson and Barney (1952). These parameters are not, of course,
satisfactory for most consonants, if only because the formula for
computing
would break down under the circumstances of fricative
narrowing and stop closure. Ichikawa et al. (1967) propose
another, more general scheme for which the parameters are the
maximal constriction point P and the maximum area points
V1 and V2
of the front and back cavities formed by this constriction. Ichikawa
and Nakata (1968) report that they have used this very over-simplified
model in a synthesis-by-rule system. It does not seem likely, however,
that any parametric description of vocal tract shape will prove
satisfactory unless it directly reflects the behavior of the various
articulators in some detail. As Ladefoged (1964:208) has observed,
'describing articulations in terms of the highest point of the tongue
or the point of maximum constriction of the vocal tract is rather
like describing different ways of walking in terms of movements of
the big toe or ankle'. But this is as much as to say that it
is necessary
|