By the early 60s, then, there was no doubt that speech could be
synthesized by rule by either terminal-analog or vocal-tract analog
methods. Reliable synthesizers and convenient methods of controlling
them had been developed. Even more important, the value of
explicitly formulated rules had become obvious.
3. JUSTIFICATION FOR SYNTHESIS BY RULE
Since speech has been successfully synthesized by rule, it might
seem that the basic objective of von Kempelen and his successors
has been attained; it therefore becomes important to state clearly
the reasons for continuing the research. One obvious reason is that
we still have much to learn about the physical aspects of speech
production: how the various articulators move, how their movements
are timed; how the controlling musculature operates to produce the
sounds of speech. Synthesis by rule is a way of testing our
understanding of the physical apparatus, and this is the primary
motivation for much of the activity in the field today. But this
argument may not seem very persuasive to the linguist, who is
concerned with speech as a psychological fact rather than a physical
one. But we think that synthesis by rule offers other possibilities
of substantial theoretical importance for the linguist, possibilities
which have barely begun to be explored. To justify this point of view,
however, requires brief reference to some basic questions of linguistics.
We believe, following Chomsky and Halle (Chomsky 1965, 1968;
Chomsky and Halle 1968) and other generative grammarians, that the
grammar of a language can be represented by a set of rules. A
speaker-hearer who is competent in a language has learned these
rules, and uses them to determine the grammatical structure of his
utterances or those of another speaker, since the rules 'generate'
an utterance if and only if grammatical structure can be assigned
to it. Competence does not fully determine performance: the speaker's
actual utterances may frequently be ungrammatical, and the listener
may guess a speaker's intent without consistent reference to the rules.
A subset of the rules for any language are phonological: they
convert a string of morphemes, already arranged in some order by
syntactic rules, into a phonetic representation. In the familiar
generative model (Chomsky and Halle 1968), the morphemes are
lexically represented as distinctive-feature matrices, each column
of which is a phonological segment. All phonologically redundant
feature specification is omitted, and each specified feature has one
of two values. The phonological rules complete the matrices, alter
the feature specification in certain contexts, delete and insert
segments, and assign a range of numerical values to the features.
The output of the phonological rules, then, is a matrix of phonetic
segments for which each of the features is numerically specified.
Besides these acquired rules, we suppose the speaker of the
language to have
|