BELL TELEPHONE LABORATORIES (BTL) - continued
2. TEXT-TO-SPEECH SYSTEMS
CONTENTS:
Articulatory Model 1
Articulatory Model 2
Synthetic Dyad Concatenation
Demisyllable Concatenation
------------------------------------------------------------- Top
PROJECT: ARTICULATORY SYNTHESIS - MODEL 1
Seven parameters for lips, velum and tongue to determine vocal
tract shape. From the shape, formant frequencies are determined
and used to control a formant synthesizer. Two dimensional model.
Synthesis from specification of sequence of target shapes and
excitation configurations.
1966 Coker, C.H., and O. Fujimura, "Model for specification of
vocal-tract area function", J. Acoust. Soc. Amer., 40, 1271
(A). (I)
1967 Coker, C.H., "Synthesis by rule from articulatory
parameters", Proc. 1967 Conf. Speech Commun. Process., Paper
A9, 52-53 (1967) (B)
Tape ?
Synthesis from specification of desired phonetic sequence. Library
of about 50 target shapes and excitation configurations.
1968 Coker, C.H., "Speech synthesis with a parametric
articulatory model", Speech Symposium, Kyoto, Paper A-4
(1968) (B,K)
SSSHP 32.19 Tape: Demo to accompany "Review of Text-to-speech
conversion for English," D.H. Klatt, JASA 82.3, 9/87.
(3 sen: "This is a computer vocal tract speaking.
You are.. The number you ... has been changed.")
Cassette, Klatt MIT A/D and D/A
1970 Flanagan, J.L., C.H. Coker, L.R. Rabiner, R.W. Schafer, and
N. Umeda, "Synthetic voices for computers", IEEE Spectrum,
7, 22-45 (1970). Manual phonetic input. (B) Plastic
demonstration diskette included (SSSHP 59).
SSSHP 99.8 Tape: "Synthetic Voices for Computers, BTL, 1970"
(syn: "Good morning. I am a computer. I can read stories
and speak them aloud. I do not understand what the words
mean when I read them, but I can guess which words are
important and which words are not, by rules I have been
given. Some day I may be able to provide many kinds of
information by telephone.")
Cassette copy of disk recording SSSHP 59, stylus noise
**** need copy of master ****
SSSHP 82.1 Tape: "The Human Voice and the Computer, IEEE
Soundings, Aug 1, 1971. SSSHP 82.1 same as SSSHP 99.8,
but for first two words.
(syn, 4 sen: "I am a computer. I can read ... by
telephone.")
Commercial cassette, fair quality
SSSHP 91.8 Tape: "MIT - DEMO TAPE 1, 10/90". Coker and Umeda,
1970.
(syn, 34 sec, Christmas song: "Christmas, Christmas,
toys and noise, that's Christmas. Tiny Tim's and ...
that's Christmas.")
7" reel, 7.5 ips, good quality, Klatt 9/1/71 tape copy
**** use for master ****
1976 Coker, C.H., "A model of articulatory dynamics and control,"
Proc. IEEE 64, 452-459 (1976). (K)
Tape ? ("The grey tie is. The great eye is. We were away in
September.")
Text-to-speech synthesis by rule based on articulatory
synthesizer. First Bell Laboratories text-to-speech system.
Attention to detail in the specification of segmental durations
and allophonic variation. (K) Appendix B has a detailed
description of the experimental laboratory equipment.
1970 Flanagan, J.L., C.H. Coker, L.R. Rabiner, R.W. Schafer, and
N. Umeda, "Synthetic voices for computers", IEEE Spectrum,
7, 22-45 (1970). Automatic text-to-speech. (B) Plastic
demonstration diskette included (SSSHP 59).
SSSHP 99 Tape: "SYNTHETIC VOICES FOR COMPUTERS, BTL, 1970.
(99.9, 6 sen: "The North Wind and the Sun were ... the
Sun was the stronger of the two.")
(99.10, 3 sen:"This is the computer talking...
synthesized speech. Good night, Folks.")
Cassette copy of disk recording SSSHP 59, stylus noise
**** need copy of master ****
SSSHP 82.9 Tape: "The Human Voice and the Computer, IEEE
Soundings, Aug 1, 1971. Same as SSSHP 99.10, but for
last three words.
(syn, 2 sen: "This is the computer talking. It's been
a pleasure communicating ... synthesized speech.")
Commercial cassette, fair quality
1971 SSSHP 91.9 Tape: "MIT - Demo Tape 1, 10/90". Coker and Umeda,
1971.
(syn, 4 sen: "... am a computer. I can read stories ...
information by telephone.")
7" reel, 7.5 ips, good quality
1972 Demonstration at 1972 Int. Conf. of Speech Comm. and Proc. in
Boston.
SSSHP 32.25 Tape: Demo to accompany "Review of Text-to-speech
conversion for English," D.H. Klatt, JASA 82.3, 9/87.
(4 sen: "I can read stories ... by telephone.")
Cassette, Klatt MIT A/D and D/A of SSSHP 91.9
19?? SSSHP 91.15 Tape: "MIT - DEMO TAPE 1, 10/90"
(syn, 50 sec: "Hello, this is a demonstration of speech
synthesis from an articulatory model. To generate
synthetic speech, I start with ordinary spelling of
English words. I transform spelling to sound with a
dictionary, together with rules for compound words and
English prefixes and suffixes. I create the sounds of
speech ... Thank you.")
7" reel, 7.5 ips, good quality, undated Klatt tape
1973 Coker, C.H., N. Umeda, and C.P. Browman, "Automatic
synthesis from ordinary English text", IEEE Trans. Audio and
Electro. AU-21, 293-297 (1973). (K) Similar version in (B)
1976 Umeda, N., "Linguistic rules for text-to-speech synthesis,"
Proc. IEEE 64, 443-451 (1976). No tape recording. (K)
1996 Coker, C.H., M.H. Krane, B.Y. Reis and R.A. Kubli, "Search
for unexplored effects in speech production," Fourth
International Congress on Speech and Language Processing
(ICSLP96), Paper FrP1S1.1, 1996. (Preprint in SSSHP USA BTL
file.) Sequential effects in excitation, aspiration during
voicing and effects of air flow on acoustics of the vocal
tract.
------------------------------------------------------------- Top
PROJECT: ARTICULATORY SYNTHESIS - MODEL 2
Simulated articulatory synthesizer.
1971 Mermelstein, P., "Calculation of the vocal-tract transfer
function for speech synthesis applications", Proc. Seventh
Intern. Congr. Acoust., Paper 23 C 13, 173-176 (1971).
Computation of transfer function from shape of vocal tract.
No tape recording. (B)
1973 Mermelstein, P., "Articulatory model for the study of speech
production," J. Acoust. Soc. Amer. 53, 1070-1082 (1973).
(K)
Tape ? ("Why did Ken set the soggy net on top of his
deck?", 8 /huh'CV/ utterances for identification test)
1974 Mermelstein, P., "Computer simulation of articulatory
activity in speech production", Proc. Int. Joint Conf. on
Artificial Intell., Washington, D.C., Gordon & Breach, New
York 1974. (I)
Tape ?
------------------------------------------------------------- Top
PROJECT: TEXT TO SPEECH USING SYNTHETIC DYADS
Transitions from human speech stored as linear-prediction area
parameters of a synthetic dyad unit. "Steady state" portions are
obtained by connecting with straight lines the end points of
adjacent transitions. In some cases, only the transition end
points are stored with the actual transition being calculated by
interpolation. Duration, pitch, and amplitude modified during
synthesis. Large morpheme dictionary, letter-to-sound rules.
1974 Olive, J.P., and L.H. Nakatani, "Rule synthesis of speech by
word concatenation: a first step", J. Acoust. Soc. Amer. 55,
660-666 (1974). (K)
Tape ?
1976 Olive, J.P., and N. Spickenagle, "Speech resynthesis from
phoneme-related parameters," J. Acoust. Soc. Amer 59,
993-996 (1976). (K) does this belong here?
1977 Olive, J.P., "Rule synthesis of speech from diadic units",
Proc. ICASSP-77, 568-570 (1977). Demonstrated at Epcot
Center of Walt Disney World. (K)
Tape ?
1979 Olive, J.P., and M.Y. Liberman, "A set of concatenative
units for speech synthesis," in SPEECH COMMUNICATION PAPERS
PRESENTED AT THE 97TH MEETING OF THE ACOUSTICAL SOCIETY OF
AMERICA, J.J. Wolf and D.H. Klatt Eds., Amer. Inst. of
Physics, New York, 515-518 (1979). Uses consonant clusters
as units when necessary. (K)
Tape?
Synthesis from phonetic input. The synthesizer requires as its
input a string of phonemes and the associated duration, pitch and
amplitude parameters. The synthesis scheme uses a large table of
stored transitions, dyads, between phonemes.
1980 Olive, J.P., "A scheme for concatenating units for speech
synthesis," Proc. ICASSP-80, Denver CO, April, 568-571, 1980.
SSSHP 91.16 Tape: "MIT - DEMO TAPE 1, 10/90"
(syn, 1:08 min: "This paper describes a small real time
speech synthesizer. The synthesizer requires as its
input, a string of phonemes and the associated duration,
pitch and amplitude parameters. The synthesis scheme
uses a large table of stored transitions, dyads, between
phonemes. These transitions are stored ... LPC derived
area parameters ... LSI 1123 microcomputer ... digital
signal processor.")
7" reel, 7.5 ips, good quality, copy of undated Klatt
tape
**** When was this tape created? ****
SSSHP 93.9 Tape: "MIT - DEMO TAPE 3, 10/90". Olive, 1985.(?)
(syn, 1:57 min: "This paper describes a small real-time
speech synthesizer. The synthesizer requires as its
inputs a string of phonemes and the associated duration,
pitch, and amplitude parameters. The synthesis scheme
uses a large table of stored transitions, dyads, ...
LSI 1123 microprocessor ... Our original programs were
written in C, and ran under the UNIX time-sharing system
... LPC parameters.")
7" reel, 7.5 ips, good quality, copy of Klatt tape
**** When was this tape created? ****
SSSHP 32.22 Tape: Demo to accompany "Review of Text-to-speech
conversion for English," D.H. Klatt, JASA 82.3, 9/87.
(syn: "This paper describes a small real-time speech
synthesizer. The synthesizer requires as its input a
string of phonemes, and the associated duration, pitch,
and amplitude parameters. The synthesis scheme uses a
large table of stored transitions, dyads, between
phonemes. These transitions are stored in terms of
LPC-derived area parameters.")
Cassette, Klatt MIT A/D and D/A of SSSHP 91.16 or 93.9
1983 Pols, L.C.W. and J.P. Olive, "Intelligibility of consonants
in CVC utterances produced by diadic rule synthesis," Speech
Communication 2, 3-13 (1983). CVC nonsense syllables to high
school students, Olive 1977 LP segment concatenation scheme.
(K)
Tape? (CVC test material)
Complete laboratory text-to-speech system using the Olive (1977)
diphone synthesis strategy in combination with a large morpheme
dictionary from C.H. Coker and letter-to-sound rules from K.
Church. A third generation version. Developmental real-time
text-to-speech board with 900 Kbytes storage. (K)
1985 Olive, J.P., and M.Y. Liberman, "Text-to-speech - an
overview," J. Acoust. Soc. Amer. 78 Suppl. 1, S6 (1985).
Coker, C.H., "A dictionary-intensive letter-to-sound
program," J. Acoust. Soc. Amer. 78 Suppl. 1, S7 (1985). A
43,000 morpheme lexicon. (K)
Church, K.W., "Stress assignment in letter-to-sound rules
for speech synthesis," Proc. 23rd Meeting Assoc. Comp.
Ling., 246-253 (1985). (K)
SSSHP 32.34 Tape: Demo to accompany "Review of text-to-speech
conversion for English," D.H. Klatt, JASA 82.3, 9/87.
(syn: "This paper will give a brief overview of recent
text-to-speech work at Bell Laboratories. Starting
about a year ago, we have completed a new set of
computer programs that translate English text into
sound. This system constructs speech sounds by
concatenating elements from an inventory of about 900
units, stored in terms of multipulse LPC coding.")
Cassette, Klatt MIT A/D and D/A
**** need copy of master ****
SSSHP 93.7 Tape: "MIT - DEMO TAPE 3, 10/90". Olive and
Liberman, no date.
(syn, 5 sen: "I am a talking computer at Bell Labs.
Many computers can talk. However, most of them can only
say words stored in their memory. I have an unlimited
vocabulary because I can read text, translate it into
sound rules, and connect these sounds into sentences.
Researchers are improving my speech by correcting
mistakes that I still make and trying to make me sound
better.")
7" reel, 7.5 ips, good quality, copy of Klatt tape
**** when was this tape created? ****
1988 Improvement in pronounciation of foreign names and other
languages (example of Mandrin Chinese.) Example of
man/machine interaction using a speech recognition program.
SSSHP 48 Tape: "Text-to-Speech Synthesis Using Dyads", M.Y.
Liberman & J.P. Olive, 12-88"
(syn, 2:11 min: "Hello, I am a system for real time
translation of unrestricted text into speech, developed
in the Information Principles Laboratory at AT&T Bell
Labs. My input is ordinary text, and ... can pronounce
proper names of many nationalities correctly, for
instance ... easy for us to produce new voices,
accents, or languages ... a system for Mandrin Chinese
that sounds like this ... ... This is the Bell
Laboratories Flight Information System. May I help
you?" ... (dialog with user) ... available.")
Cassette, good quality, copy of BTL tape
**** use for master ****
------------------------------------------------------------- Top
PROJECT: SYNTHESIS USING DEMISYLLABLE CONCATENATION
Assembly and modification of Linear Predictive Coded segments
from human speech.
1977 Fujimura, O., M.J. Macchi, and J.B. Lovins, "Demisyllables
and affixes for speech synthesis," Proc. of the 9th ICA,
Madrid, Spain, July 4-9, Vol. 1, 513, 1977.
Tape?
1978 Fujimura, O., and J. Lovins, "Syllables as concatenative
phonetic elements," in SYLLABLES AND SEGMENTS, ed. by A.
Bell and J.B. Hooper, North-Holland, New York, xx-xx (1978).
(K)
Tape?
1979 Lovins, J.B., M.J. Macchi, and O. Fujimura, "A demisyllable
inventory for speech synthesis," Speech Communication
Papers, 97th meeting of the Acoust. Soc. Amer, MIT,
Cambridge MA, June 12-16, 519-522, 1979. 834 demisyllables,
5 phonetic affixes, and 100 reduced-vowel units (CV or VC
syllables) in LPC parameters. First attempts to make phrases
and sentences.
Tape? (150 English monosyllable words, "The United States
and other locations are divided into telephone areas,
each of which is identified by a three-digit area code
number")
1980 Browman, C.P., "Rules for demisyllable synthesis using
Lingua, a language interpreter," Proc. ICASSP-80, 561-564
(1980). (K)
SSSHP 93.8 Tape: "MIT - DEMO TAPE 3, 10/90"
(syn, 10 sen, 1:57 min: "Hello, I am a language
interpreter named Lingua. I was created at Bell
Laboratories by Cathy Browman, with a lot of help from a
number of people ... type the sentences you want me to
say into the computer, indicating how you want me to
pronounce the sentences, and which words you want to
emphasize ... linguistic research can be useful for all
of us.")
SSSHP 32.23 Tape: Demo to accompany "Review of text-to-speech
conversion for English," D.H. Klatt, JASA 82.3, 9/87.
(5 sen: "Hello, I am a language ... should be")
Cassette, Klatt MIT A/D and D/A of SSSHP 93.8
|