degrees of freedom of the component articulators that make up the
speech production system, nor the control constraints and strategies.
There are a large number of researchers active in this area who make
use of a wide range of devices to measure mechanical motions of
individual articulators, and even EMG signals in individual muscles,
but few if any of these scientists are pursuing a goal directly
related to the assembly of synthesis rules for English syllables and
sentences. Theorizing on the potential advantages of articulatory
models in linguistics is another active area, but until such time as
feature implementation rules become the central focus of effort
(Goldstein and Browman, 1986), resulting in models detailed enough
to reveal the immensity of the control problem, such theorizing is
of marginal value to us. In summary, it seems that the study of basic
processes of speech production and speech perception is crucial to
progress, but is only in its infancy. Support of these activities will
ultimately provide us with new insights and technical abilities. In
the other direction, text-to-speech conversion is an excellent focus
for sharpening the questions asked in basic research.
Let me quickly enumerate those areas where improvement in
text-to-speech system performance is possible and reasonably
straightforward. In all of the current systems, text analysis errors
of many sorts are still possible. Deficiencies which have been
identified in this article are summarized in
Table XIII. The formatting
routines may not be primed to deal with unusual letter or number
strings. The word pronunciation routines have a certain probability
of error in dealing with unfamiliar words, and this error rate tends
to go up when dealing with foreign words and proper names. The syntax
analysis routines may not be able to properly derive phrase structure
for some sentences, or they may be unable to choose between two
alternative pronunciations of an ambiguous orthographic word. These
errors of text analysis are moderately frequent, occurring in as many
as a third of the sentences of running text. Incremental improvements
to formatting routines, augmentations to ever larger morpheme
dictionaries (Coker, 1985), and additional parsing heuristics should
lead to improved performance in this area. On the other hand, high
performance syntactic analysis may turn out to require semantic
knowledge, which would imply very large data structures and programs
that may not be available for some time.
The problems remaining in the synthesis algorithms of text-to-speech
systems are also listed in
Table XIII. If one makes a spectrogram of
a sentence produced by a text-to-speech system, and compares it with
a sentence read by the person whose speech formed the basis for
system development, it is easy to see ways in which the two acoustic
patterns
|