NMAH | Smithsonian Speech Synthesis History Project (im

of the stop-fricative attribute; F2 and F3 loci and the weighting factor, on the place-of-articulation attribute. Given the boundary value, steady-state values and transition times, transitions are calculated as by Holmes et al.

Rao and Thosar resort to stored values for vowel spectra and vowel durations; Kim (1966), however, proposes that even these matters can be systematically treated. For example, his translation from distinctive feature values to formant frequencies is made by defining the features in terms of 'degrees' of difference from the frequencies. From the value assigned to one degree, and the frequencies, the frequencies of other vowels are calculated by means of such rules as 'if High, -2d'. The formant frequency values determined in this way agree well with the data in the literature. However, since the degree values are not predicted on any principled basis, but are arrived at inductively by an averaging procedure applied to this same data, the agreement is hardly surprising and does not represent any interesting advance over stored values.

Several of these systems have been empirically successful in that they have proved capable of consistently producing intelligible speech. They also have enough theoretical plausibility to be used in investigations of other components. One could, for example, use them to test phonological rules proposed for a language (Mattingly 1971). But they are still inadequate because their working assumption is that phonetic capacity can be adequately described at the acoustic level. If this were so, a simple and consistent correspondence would hold between phonetic features and acoustic events. But in fact the correspondence is only partial. On the one hand, certain regularities are observable, which can be exploited in a synthesis-by-rule system, as Liberman et al. (1959) pointed out: F1 and F2 transitions and the type of acoustic activity during stop closure provide a basis for a purely acoustic classification of labial, dental and velar voiced stops, voiceless stops and nasals. On the other hand, the cues for a particular feature, regarded simply from an acoustic standpoint, are a rather arbitrary collection of events. There seems no special reason why a fall in F1, a 60-150 msec. gap, a burst, and a rise of F1 should all be cues for a stop consonant, and no obvious connection between the locus frequency and the burst frequency of a stop at the same place of articulation. These cues only make sense in articulatory terms. Still, the apparent arbitrariness of the cues should not in itself discourage the formulation of acoustic rules for features. A more serious difficulty is that in many cases features cannot be independently defined at the acoustic level. Thus the voiced- voiceless distinction is cued in one way for stops and in another for fricatives. The frequencies at which noise is found in a fricative do not correspond to the frequencies of either the locus or the burst of a stop at a similar point of articulation. The frequencies of the first and second formants are sufficient to distinguish the non-retroflex vowels, but the range of F1 variation seems to be influenced by the F2 value: the vowels are not distributed regularly in F1/F2 space. Because of these difficulties most of the acoustic synthesis by rule systems provide only for a regular relationship between

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use