might be criticized because the set of phonetic features, on which
their much more principled account of phonological capacity depends,
as yet lacks a fully satisfactory and explicit basis in phonetic
capacity (Abramson and Lisker 1970). Obviously, it is very desirable
to state clearly, when a certain component is being investigated,
how this component is assumed to depend on other components.
Third, the ultimate check of a hypothesis concerning any or all of
the components is of course the intuition of the native speaker
(Chomsky 1965: 21). However, the only reliable way to consult his
intuition is to present him with speech which we have made sure
conforms to our current phonetic or phonological hypothesis and find
out whether he considers it well-formed. To do this, however, we need
carefully controlled speech stimuli (Lisker et al. 1962;
Mattingly 1971).
Synthesis by rule is a technique which seems to meet these
requirements. With the computer we can simulate our phonological
and phonetic formulations rigorously; errors of form and logic come
to light all too quickly. We are compelled to be explicit about the
assumptions we make about other components; if they are simplistic
or inadequate we will not be allowed to forget the fact. And we can
check the native speaker's intuition directly by producing controlled
synthetic speech.
Let us briefly consider what an ideal speech synthesis by rule system
would be like. It would, in the first place, simulate all the
components we have just discussed. Phonetic capacity would be
represented by a synthesizer and computer programs controlling it
which are capable of generating just those sounds which can be
distinguished in production and perception by the speaker-hearer;
phonological competence, by the rules of some language, stated in
a form which would be an acceptable input to the system; phonological
capacity, by a part of the computer program itself, which would impose
severe limitations on the form or substance of the rules; and phonetic
skill, by an additional set of rules specific to some particular
speaker. The combined effect of all components should be such as to
restrict the possible utterances to just those which are well-formed
speech in a particular language (assuming appropriate syntactic and
semantic constraints) from one particular speaker to another.
For each component, moreover, we would want to include all those
aspects, and only those, which are relevant to the capacity and
competence underlying the production and perception of speech.
Suppose, for instance (contrary to our present expectations) that,
from a psychological standpoint, speech production proved to be only
a matter of transmitting certain cues definable in acoustic terms
and invariantly related to phonetic features, and that speech
perception consisted simply in detecting these cues. Our 'neural
vocal tract simulation' could then be just a terminal analog
synthesizer. There would then be no reason for including neuromotor
commands, gestures or shape change in a parsimonious synthesis by
rule system, because these matters would be irrelevant to phonetic
capacity. They might continue to be of great interest from the
standpoint of the physiologist and acoustician interested in speech,
but would have no claim on the linguist's attention.
|