- First prosodic synthesis by rule, by Ignatius Mattingly,
The synthesis-by-rule program of Mattingly (1966; 1968) of the
Haskins Laboratories was demonstrated to accompany his Ph.D.
- Sentence-level phonology incorporated in rules by Dennis Klatt,
Klatt (1976b) of the M.I.T. Speech Communication Group created a
phonological component to generate segmental durations and a
fundamental frequency contour, as well as sentence-level allophonic
variation, from a phonemic input augmented with stress and syntactic
- Concatenation of linear-prediction diphones, by Joe
Olive (1977) of AT&T Bell Laboratories controlled a linear-prediction
synthesizer from stored reflection coefficients for a set of diphones.
The system was demonstrated at ICASSP-77. The recording is from about
1980, and includes prosodic rules provided by Liberman and
- Concatenation of linear-prediction demisyllables, by Cathrine
A synthesis-by-rule program with prosodic rules, called Lingua, was
designed by Browman (1980) of AT&T Bell Laboratories, using the
demisyllable inventory collected by Fujimura and Lovins (1978).
Demonstrated at ICASSP-80. (text)
Part D: Fully automatic text-to-speech conversion
- The first full text-to-speech system, done in Japan by Noriko
Umeda et al., 1968.
The first demonstrated text-to-speech system for English was
created by Umeda et al. (1968) of the Electrotechnical Laboratory
in Japan, and was based on an articulatory model. It included a
syntactic analysis module with sophisticated heuristics.
Demonstrated at the 6th International Congress on Acoustics, in
Tokyo in 1968. (text)
- The first Bell Laboratories text-to-speech system, by Cecil
Coker, Noriko Umeda, and Cathrine Browman, 1973.
Coker et al. (1973) of AT&T Bell Laboratories demonstrated a
text-to-speech program based on the Coker (1967) articulatory model.
[Ed: see Coker (1968)]
The system was demonstrated at the 1972 International Conference of
Speech Communication and Processing in Boston.
- The Haskins Laboratories text-to-speech system, 1973.
The Haskins Laboratories text-to-speech system (Cooper et al., 1973)
used the Mattingly (1968) phoneme-to-speech rules coupled with a large
- The Kurzweil reading machine for the blind, Raymond Kurzweil,
Kurzweil (1976) began selling a reading machine with an optical
scanner in the late 1970s. The system was demonstrated on the CBS
evening news. (text)
- The inexpensive Votrax Type-n-Talk system, by Richard Gagnon,
The Votrax low-cost Type-n-Talk text-to-speech system combines a
single-chip synthesis-by-rule program and formant synthesizer
(Gagnon, 1978) with a version of the Elovitz et al. (1976)
letter-to-sound rules. It was demonstrated at the 1978 ICASSP
- The Echo low-cost diphone concatenation system, about 1982.
The Echo low-cost text-to-speech system concatenates
linear-prediction diphones using the Texas Instrument's TMS-5220
linear prediction synthesizer chip.
- The M.I.T. MITalk system, by Jonathan Allen, Sheri Hunnicutt,
and Dennis Klatt, 1979.
The MITalk-79 laboratory text-to-speech system, developed at the
Massachusetts Institute of Technology by Allen et al. (1979,
1987) and many others. The system was demonstrated in its final
form at the 1979 meeting of the Acoustical Society of America in
- The multi-language Infovox system, by Rolf Carlson, Bjorn
Granström, and Sheri Hunnicutt, 1982.
The Infovox commercial text-to-speech system (Magnesson et al.,
1984) is an implementation of the Carlson et al. (1982a)
multilanguage system that was developed at the Royal Institute of
Technology in Stockholm by Rolf Carlson et al. Versions of the
system were demonstrated in 1976 and 1982 at ICASSP conferences.
- The Speech Plus Inc. "Prose-2000" commercial system, 1982.
The Prose-2000 commercial text-to-speech system was first developed
in conjunction with a reading machine for the blind project at
Telesensory Systems by James Bliss and his associates (Goldhor
and Lund, 1983; Groner et al., 1982). The recording is of Version
3.0 of the software. (text)
- The Klattalk system, by Dennis Klatt of M.I.T. which formed
the basis for Digital Equipment Corporation's DECtalk commercial
The Klattalk (1982a) laboratory text-to-speech system software
was licensed to Digital Equipment Corporation as a basis for the
commercial DECtalk text-to-speech system announced in 1983. The
recording is of Version 3.0 of the DECtalk software.
- The AT&T Bell Laboratories text-to-speech system, 1985.
A new AT&T Bell Laboratories laboratory text-to-speech system
(Olive and Liberman, 1985) uses the Olive (1977) diphone synthesis
strategy in combination with a large morpheme dictionary (Coker,
1985) and letter-to-sound rules (Church, 1985). The laboratory
system was demonstrated at a 1985 meeting of the Acoustical Society
of America. (text)
- Several of the DECtalk voices.
Examples of some of the voices provided by the DECtalk
text-to-speech system: (1) Beautiful Betty, (2) Huge Harry, (3) Kit
the Kid, (4) Whispering Wendy. (text)
- DECtalk speaking at about 300 words/minute.
Example of using the DECtalk speaking rate command to skim material
at a rapid rate. The nominal speaking rate has been set to 350
words/min, [:ra 350], although this 51-word passage took 11 s to
speak, indicating an effective rate slightly under 300 words/min.