NMAH | Smithsonian Speech Synthesis History Project (im

of the speech synthesis research of the past thirty years has been prompted by interest in vocoding (i.e. voice coding). The channel capacity (equivalently, the bandwidth in the radio spectrum) required for transmission of speech is many times greater than it ought to be, considering the amount of information, in Shannon's sense, which is carried by the speech signal. Since the channel capacity available for radio and cable communications is limited, many schemes have been devised to 'compress' speech by analyzing the speech wave and transmitting only the information needed to synthesize an intelligible version at the receiving end. For example, in Dudley's (1939) original Vocoder, built at Bell Telephone Laboratories, the spectrum of telephone speech (250-3000 Hz) is analyzed by a bank of 10 filters. The smoothed, rectified output of each filter represents the energy in a certain part of the spectrum as a function of time. Another circuit tracks Fo, the fundamental frequency (for voiceless excitation, the output of this circuit is zero). The vocoder transmits the outputs of the Fo tracker and of the filters. Since these functions vary relatively slowly, the channel capacity needed for all 11 functions is far less than the unprocessed speech signal would require. To synthesize the speech, the frequency of a buzz source is varied according to the Fo function (a hiss source is used when this function has zero value). The buzz or hiss excites each of a set of filters matching those used in the analysis, and the amplitude of the output from each synthesizing filter is determined by the function for the corresponding analyzing filter. Summing the outputs of the synthesis filters yields an intelligible version of the original speech.

A second type of vocoder is the formant vocoder (Munson and Montgomery 1950). In a formant vocoder, the analyzer tracks the excitation state, Fo, and the frequencies and amplitudes of the lowest three formants of the original speech, and transmits these functions; in the synthesizer, resonant circuits representing the three formants are appropriately excited, and the transmitted functions also determine the frequency and the amplitude for each resonator. The saving in channel capacity is greater than for a filter-bank vocoder, but correct analysis is much more difficult. Both filter-bank and formant synthesizers have proved to be of value for phonetic and phonological research as well as for communications.

Besides vocoding, there are certain other possible applications for synthetic speech. If it is necessary for a machine to communicate with its user -- a computer operator or a student undergoing computer- assisted instruction -- and heavy demands are already being made on his visual attention, spoken messages may be the solution. But if fast random access to a large inventory of messages is required, storage of natural speech becomes cumbersome, for speech makes the same exorbitant demands on storage capacity as it does on channel capacity (Atkinson and Wilson 1968). Synthetic speech, if it could be stored in some kind of minimal representation, would be an attractive alternative. Still another application is a reading- machine for the blind. In such a device, printed text must be converted to spoken output with the aid of a dictionary in which written and spoken elements

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use