NMAH | Smithsonian Speech Synthesis History Project (im

Our ideal system is not concerned with performance as such. Even though our model is dynamic and the output is audible, the process of synthesis is a derivation according to rules, not a life-like imitation of a speaker's actual speech behavior. The output is acceptable to the hearer because it follows the rules, not just because, on the one hand, it is intelligible, despite errors and deviations, or on the other, because it is highly natural-sounding -- though one might expect that the output of an ideal system would be natural-sounding, if not physically naturalistic. Here our emphasis differs somewhat from that of Ladefoged (1967) and Kim (1966) who share our conviction that it is important to do synthesis by rule, but for whom linguistic and phonetic theory 'must lead to the specification of actual utterances by individual speakers of each language; this is physical phonetics' (Ladefoged 1967: 58). From our point of view it is not physical realism but psychological acceptability which is the proper evidence for correctness at the phonological and phonetic levels, just as it is on the syntactic level.

In the preceding discussion, we have deliberately generalized the concept of 'synthesis by rule' to embrace phonology and phonetics. It would be possible to generalize still further, to include syntax and semantics in a synthesis by rule system. But while computer simulations of syntactic and semantic rules are certainly desirable, the motivation for coupling them to a phonological and phonetic synthesis by rule system is less compelling, primarily because a set of syntactic rules can in practice be evaluated more or less independently of the associated phonology and phonetics.

4. CURRENT WORK IN SYNTHESIS BY RULE

We turn now to an assessment of the progress which has been made toward the ideal which has just been sketched. The first thing to be said is that most of the activity and most of the progress so far falls under the heading of phonetic capacity. Since the other components all depend, directly or indirectly, on phonetic capacity, this is just as it should be. Moreover, since we want to assess the role of the different stages of the speech chain in phonetic capacity, it is good that, in the present state of our knowledge, the research has been pluralistic: different types of systems have been developed in which the contribution of different stages has been emphasized. This has been difficult to do because appropriate data on which to base investigations at stages before the acoustic stage are hard to collect. At present, most of the work has been at the acoustic stage; the relationship between shape and acoustic output is quite well understood and several synthesis-by-rule systems operating on vocal-tract shape have been developed; systems which represent the movements of the actual articulators are beginning to show results; and some work has been done at the neuromotor command stage. 8
__________
8. There is, of course, another way to synthesize speech by rule, and that is to compile an utterance from an inventory of shorter segments, themselves either natural or synthetic. Such approaches may have practical value, but from a theoretical standpoint they merely serve to remind us that there is no simple correspondence between phones and segments of the acoustic signal. See the discussion in Liberman et al. 1959. Systems in which speech is compiled from natural segments have been described in Harris 1953, Peterson et al. 1958, Cooper et al. 1969. Systems using synthetic segments are described in Estes et al. 1964, Dixon and Maxey 1968 and Cooper et al. 1969.

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use