SSSHP Contents | Labs

 KLATT 1987, p. 779 
Go to Page | Contents E. Suitability | Index | Bibl. | Page- | Page+
 

of current text-to-speech systems may not be adequate for applications that involve many unfamiliar names (the telephone itself is somewhat marginal for this purpose, synthesis implies a further intelligibility reduction, and relatively poorer performance of these systems in converting names to phonemes adds additional potential confusion).

A current limitation for systems using the telephone is that the computer cannot listen as well as it can talk. Speech recognition technology is lagging behind synthesis capabilities. Presently, any user responses must be entered by telephone key pad commands, and not all telephones are pushbutton phones. Speaker-independent connected digit recognition systems are now being demonstrated in the laboratory with better than 98% string recognition accuracy (Bush and Kopek, 1986); perhaps this technology will become commercially available soon.

Eventually, text-to-speech systems will compete with products now used to produce canned messages from waveform-coding chips. Currently, such waveform encoding systems are thought to produce far more natural speech (due to natural timing, intonation, and voice quality obtained from a human utterance), even though measured intelligibility is significantly lower than for the better text-to-speech systems (Nixon et al., 1985). If text-to-speech systems can be increased in naturalness even slightly, the advantages in terms of ease of message assembly can easily outweigh the cost advantage accruing to the waveform coder for many applications.

V. SPECIAL APPLICATIONS

Text-to-speech systems are beginning to be applied in many ways, including aids for the handicapped, medical aids, and teaching aids. This brief section is included in the hope of stimulating additional humanitarian applications for this new technology. It is an unfortunate fact of life in the United States that funds for transfer of this technology to the handicapped are scarce, and funds to actually purchase devices for individuals are virtually nonexistent. We depend on the success of text-to-speech systems in the commercial marketplace to lead to less costly portable low-power devices that, through the good will of industry, may be made available at special discounts for the handicapped.20

A. Talking aids for the vocally handicapped

The first kind of aid to be considered is a talking aid for the vocally handicapped. There are over 1.5 million nonspeaking persons in the USA, excluding the deaf, according to a survey made by the American Speech and Hearing Association (ASHA, 1981). Any person in this group who can point at some kind of a communication board or use a typewriter keyboard is a potential user of a communication aid that involves conversion of text to speech. A continually updated listing of communication aids for the nonvocal is maintained by the Trace Center of the University of Wisconsin (Vanderheiden, 1978, 1985); see also the quarterly publication Communication Outlook (Portnoy, 1979-present) and Bernstein (in press).

A potential advantage of DECtalk in this application is the possibility of fitting the voice characteristics to the user, particularly the advantage of giving women a femalelike voice and children a childlike voice. Prior to the availability of DECtalk, a 16-year old girl in Arizona who was injured in an automobile accident refused to use a talking aid because it made her sound masculine. On the other hand, some young cerebral palsy children seem to enjoy having a robotlike monotone voice speak for them when among their peers in a classroom setting.

Warrick et al. (1977) identify a number of capabilities that would facilitate wider use of talking aids: (1) natural distinguishable voices for each child in a classroom, (2) ability to express emphasis and attitude, (3) lighter weight and more portable configurations, (4) predictive type-ahead or other methods for speeding text specification. As pointed out by Bernstein (1986), natural voices are distinguished from one another by many types of cues that not only signal gender, but also approximate size, age, and regional accent. Current synthesis algorithms modify vocal tract size and laryngeal waveform to distinguish among a small set of speakers, but do not include capabilities to modify dialect, timing, intonation, allophonic selection, or phonetic realization. Users of talking aids can be frustrated by an inability to convey emotions such as urgency or friendliness by voice. Everything comes out in a sort of semantically neutral way, although some systems provide an ability to emphasize selected words.

The vocally handicapped present a wide range of motor difficulties that requires ingenious solutions to permit text creation. One method for speeding up text input is to use a predictive input system that always displays the most frequent English word for any typed word fragment, and the user can hit a special key to accept the prediction (Hunnicutt, 1985). Another alternative, similar in some ways to shorthand, is the Bliss symbol system (Carlson et al., 1982b;
 

Go to Page | Contents A. Talking aids | Index | Bibl. | Page- | Page+

 KLATT 1987, p. 779 
SSSHP Contents | Labs
Smithsonian Speech Synthesis History Project
National Museum of American History | Archives Center
Smithsonian Institution | Privacy | Terms of Use