NMAH | Smithsonian Speech Synthesis History Project (im

vocal tract shape. Separate target shapes are provided for velars before front, central and back vowels.

Mermelstein's (in press) system follows a similar plan. Two lists serve as input to this system. The first is a table of the area function values for an inventory of shapes; the second includes a series of target shapes, specified with reference to the first list and corresponding to phones or temporal segments of phones; target values for other parameters; and transition durations. Mermelstein uses linear transitions near sharp constrictions and exponential transitions during periods when the shape of the tract is changing more slowly. He finds that this procedure, which effectively avoids steady states, contributes considerably to the naturalness of the speech.

Nakata and Mitsuoka (1965) use a more elaborate transition procedure based on a conception of Öhman (1967). In the case of vowel-to-vowel transitions over a period t', the momentary area function for a vowel at ,

where is the starting value, is the target value, and an asymptotic weighting function equal to 1 at starting and 0 at target. In the case of a consonant between two vowels, the effect of superposition of the consonant is taken as equivalent to the effect over a period of the consonant on the neutral tract,

where is the neutral tract, the consonant configuration and another weighting factor. The result of superposition

(Ichikawa and Nakata in a later paper (1968) treat superposition as multiplicative rather than additive.) Nakata and Mitsuoka claim that this rule automatically gives a good approximation of the different shapes of [k] in [ki] and [ko] at the time of maximal constriction: a fact which acoustic systems and earlier articulatory systems handle ad hoc.

The obvious advantage of using shape rules rather than acoustic rules is that the translation from the discrete to the continuous domain becomes more straightforward. The rule for transitions for stops can be stated simply and in the same terms as the rules for glides. But the notion of a target shape is rather unsatisfactory, because only a certain part of the shape is pertinent to any particular phone, and the rest must be arbitrarily specified. In another sense, moreover, a shape model is less interesting than an acoustic system. Ladefoged (1964) has

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use