NMAH | Smithsonian Speech Synthesis History Project (dk

degrees of freedom of the component articulators that make up the speech production system, nor the control constraints and strategies. There are a large number of researchers active in this area who make use of a wide range of devices to measure mechanical motions of individual articulators, and even EMG signals in individual muscles, but few if any of these scientists are pursuing a goal directly related to the assembly of synthesis rules for English syllables and sentences. Theorizing on the potential advantages of articulatory models in linguistics is another active area, but until such time as feature implementation rules become the central focus of effort (Goldstein and Browman, 1986), resulting in models detailed enough to reveal the immensity of the control problem, such theorizing is of marginal value to us. In summary, it seems that the study of basic processes of speech production and speech perception is crucial to progress, but is only in its infancy. Support of these activities will ultimately provide us with new insights and technical abilities. In the other direction, text-to-speech conversion is an excellent focus for sharpening the questions asked in basic research.

Let me quickly enumerate those areas where improvement in text-to-speech system performance is possible and reasonably straightforward. In all of the current systems, text analysis errors of many sorts are still possible. Deficiencies which have been identified in this article are summarized in Table XIII. The formatting routines may not be primed to deal with unusual letter or number strings. The word pronunciation routines have a certain probability of error in dealing with unfamiliar words, and this error rate tends to go up when dealing with foreign words and proper names. The syntax analysis routines may not be able to properly derive phrase structure for some sentences, or they may be unable to choose between two alternative pronunciations of an ambiguous orthographic word. These errors of text analysis are moderately frequent, occurring in as many as a third of the sentences of running text. Incremental improvements to formatting routines, augmentations to ever larger morpheme dictionaries (Coker, 1985), and additional parsing heuristics should lead to improved performance in this area. On the other hand, high performance syntactic analysis may turn out to require semantic knowledge, which would imply very large data structures and programs that may not be available for some time.

The problems remaining in the synthesis algorithms of text-to-speech systems are also listed in Table XIII. If one makes a spectrogram of a sentence produced by a text-to-speech system, and compares it with a sentence read by the person whose speech formed the basis for system development, it is easy to see ways in which the two acoustic patterns

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use