Review of text-to-speech conversion for English
Dennis H. Klatt
Room 36-523, Massachusetts Institute of Technology
Cambridge, Massachusetts 02139
[JASA Editor's note]
[Online transcription note]
ABSTRACT
The automatic conversion of English text to synthetic speech is
presently being performed, remarkably well, by a number of
laboratory systems and commercial devices. Progress in this area
has been made possible by advances in linguistic theory,
acoustic-phonetic characterization of English sound patterns,
perceptual psychology, mathematical modeling of speech production,
structured programming, and computer hardware design. This review
traces the early work on the development of speech synthesizers,
discovery of minimal acoustic cues for phonetic contrasts,
evolution of phonemic rule programs, incorporation of prosodic
rules, and formulation of techniques for text analysis. Examples
of rules are used liberally to illustrate the state of the art.
Many of the examples are taken from Klattalk, a text-to-speech system
developed by the author. A number of scientific problems are
identified that prevent current systems from achieving the goal of
completely human-sounding speech. While the emphasis is on rule
programs that drive a formant synthesizer, alternatives such as
articulatory synthesis and waveform concatenation are also reviewed.
An extensive bibliography has been assembled to show both the
breadth of synthesis activity and the wealth of phenomena covered
by rules in the best of these programs. A recording of selected
examples of the historical development of synthetic speech,
enclosed as a 33 1/3-rpm record, is described in the Appendix.
PACS numbers: 43.10.Ln, 43.72.Ja
Reproduced online from:
Journal of the Acoustical Society of America 82 (3), September
1987, pp. v-793, 57 pp., 34 fig, 13 tab. Copyright © 1987, Acoustical
Society of America.
With permission from:
Journal of the Acoustical Society of America
( http://asa.aip.org/jasa.html )
|