NMAH | Smithsonian Speech Synthesis History Project (im

By the early 60s, then, there was no doubt that speech could be synthesized by rule by either terminal-analog or vocal-tract analog methods. Reliable synthesizers and convenient methods of controlling them had been developed. Even more important, the value of explicitly formulated rules had become obvious.

3. JUSTIFICATION FOR SYNTHESIS BY RULE

Since speech has been successfully synthesized by rule, it might seem that the basic objective of von Kempelen and his successors has been attained; it therefore becomes important to state clearly the reasons for continuing the research. One obvious reason is that we still have much to learn about the physical aspects of speech production: how the various articulators move, how their movements are timed; how the controlling musculature operates to produce the sounds of speech. Synthesis by rule is a way of testing our understanding of the physical apparatus, and this is the primary motivation for much of the activity in the field today. But this argument may not seem very persuasive to the linguist, who is concerned with speech as a psychological fact rather than a physical one. But we think that synthesis by rule offers other possibilities of substantial theoretical importance for the linguist, possibilities which have barely begun to be explored. To justify this point of view, however, requires brief reference to some basic questions of linguistics.

We believe, following Chomsky and Halle (Chomsky 1965, 1968; Chomsky and Halle 1968) and other generative grammarians, that the grammar of a language can be represented by a set of rules. A speaker-hearer who is competent in a language has learned these rules, and uses them to determine the grammatical structure of his utterances or those of another speaker, since the rules 'generate' an utterance if and only if grammatical structure can be assigned to it. Competence does not fully determine performance: the speaker's actual utterances may frequently be ungrammatical, and the listener may guess a speaker's intent without consistent reference to the rules.

A subset of the rules for any language are phonological: they convert a string of morphemes, already arranged in some order by syntactic rules, into a phonetic representation. In the familiar generative model (Chomsky and Halle 1968), the morphemes are lexically represented as distinctive-feature matrices, each column of which is a phonological segment. All phonologically redundant feature specification is omitted, and each specified feature has one of two values. The phonological rules complete the matrices, alter the feature specification in certain contexts, delete and insert segments, and assign a range of numerical values to the features. The output of the phonological rules, then, is a matrix of phonetic segments for which each of the features is numerically specified.

Besides these acquired rules, we suppose the speaker of the language to have

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use