and Mattingly, 1985; Lisker, 1978). For example, Carlson et al. (1972)
synthesized /g/ bursts with either a single compact energy
concentration near F2, or F2 excitation plus a weak secondary energy
concentration near F4 (the next front cavity resonance). They obtained
best intelligibility scores using the more complicated burst that
better matched natural bursts. Some cues are of course more powerful
than others, but the listener appears to be responsive to an
incredible number of acoustic details and performs best when the
synthesis contains all known acoustic regularities (Dorman et al.,
1977).
The early Klatt rules for segmental synthesis were augmented by a
sentence-level phonological component (Klatt, 1976b) that derived
segment durations, fo contour, and allophonic variation by rule
(example 21 of the
Appendix).
The program has evolved over the last
10 years, and has spawned several progeny. The 1976 version was
incorporated into the MITalk text-to-speech system that was being
developed in the 1970s at M.I.T. under the guidance of Jonathan Allen
(Allen et al., 1987). The fundamental frequency algorithm of Klatt
(1976b) was replaced by one developed by O'Shaughnessy (1977). MITalk
text analysis routines included a morpheme dictionary (Allen, 1976),
letter-to-sound rules (Hunnicutt, 1976), and a phrase-level parser.
The MITalk system evolved until 1979 when the project was terminated
(Allen et al., 1979; Allen et al., 1987) (example 30 of the
Appendix).
In 1976, the MITalk letter-to-phoneme rules (Hunnicutt, 1976) and
the Klatt phoneme-to-speech program were licensed to Telesensory
Systems, Inc. for incorporation into a reading machine for the
blind (Goldhor and Lund, 1983). After considerable effort to
transform the code into a real-time device, Telesensory Systems sold
off their speech synthesis division to a newly formed company,
Speech Plus, Inc. Following further development, Speech Plus came
out with the Prose-2000 text-to-speech system in 1982 (Groner et al.,
1982) (example 32 of the
Appendix). Since that time, the segmental
synthesis rules have been modified to improve intelligibility over
limited bandwidth long-distance telephone lines (Wright et al., 1986).
For example, some noise bursts and frication spectra were enhanced
slightly with respect to normal levels in order to compensate for the
frequency response and noise characteristics of the phone. Particular
attention was paid to postvocalic consonants, where they found that
adding very brief releases into a weak schwa-like element before
silence ("man" =
) improved the
|