NMAH | Smithsonian Speech Synthesis History Project (ss

BELL TELEPHONE LABORATORIES (BTL) - continued


2. TEXT-TO-SPEECH SYSTEMS

CONTENTS:

Articulatory Model 1

Articulatory Model 2

Synthetic Dyad Concatenation

Demisyllable Concatenation


------------------------------------------------------------- Top
PROJECT: ARTICULATORY SYNTHESIS - MODEL 1


Seven parameters for lips, velum and tongue to determine vocal
tract shape. From the shape, formant frequencies are determined
and used to control a formant synthesizer. Two dimensional model.

Synthesis from specification of sequence of target shapes and
excitation configurations.


1966 Coker, C.H., and O. Fujimura, "Model for specification of
     vocal-tract area function", J. Acoust. Soc. Amer., 40, 1271
     (A).  (I)


1967 Coker, C.H., "Synthesis by rule from articulatory
     parameters", Proc. 1967 Conf. Speech Commun. Process., Paper
     A9, 52-53 (1967)  (B)

     Tape ?


Synthesis from specification of desired phonetic sequence. Library
of about 50 target shapes and excitation configurations.


1968 Coker, C.H., "Speech synthesis with a parametric
     articulatory model", Speech Symposium, Kyoto, Paper A-4
     (1968)  (B,K)

     SSSHP 32.19 Tape: Demo to accompany "Review of Text-to-speech
          conversion for English," D.H. Klatt, JASA 82.3, 9/87.
          (3 sen: "This is a computer vocal tract speaking.
          You are.. The number you ... has been changed.")
          Cassette, Klatt MIT A/D and D/A


1970 Flanagan, J.L., C.H. Coker, L.R. Rabiner, R.W. Schafer, and
     N. Umeda, "Synthetic voices for computers", IEEE Spectrum,
     7, 22-45 (1970).  Manual phonetic input.  (B) Plastic
     demonstration diskette included (SSSHP 59).

     SSSHP 99.8 Tape: "Synthetic Voices for Computers, BTL, 1970"
     (syn: "Good morning.  I am a computer.  I can read stories
     and speak them aloud.  I do not understand what the words
     mean when I read them, but I can guess which words are
     important and which words are not, by rules I have been
     given.  Some day I may be able to provide many kinds of
     information by telephone.")
     Cassette copy of disk recording SSSHP 59, stylus noise
              ****  need copy of master  ****


     SSSHP 82.1 Tape: "The Human Voice and the Computer, IEEE
          Soundings, Aug 1, 1971. SSSHP 82.1 same as SSSHP 99.8,
          but for first two words.
          (syn, 4 sen: "I am a computer. I can read ... by
          telephone.")
          Commercial cassette, fair quality

     SSSHP 91.8 Tape: "MIT - DEMO TAPE 1, 10/90". Coker and Umeda,
          1970.
          (syn, 34 sec, Christmas song: "Christmas, Christmas,
          toys and noise, that's Christmas.  Tiny Tim's and ...
          that's Christmas.")
          7" reel, 7.5 ips, good quality, Klatt 9/1/71 tape copy
                    ****  use for master  ****


1976 Coker, C.H., "A model of articulatory dynamics and control,"
     Proc. IEEE 64, 452-459 (1976).  (K)

     Tape ? ("The grey tie is. The great eye is. We were away in
             September.")


Text-to-speech synthesis by rule based on articulatory
synthesizer.  First Bell Laboratories text-to-speech system.
Attention to detail in the specification of segmental durations
and allophonic variation. (K) Appendix B has a detailed
description of the experimental laboratory equipment.


1970 Flanagan, J.L., C.H. Coker, L.R. Rabiner, R.W. Schafer, and
     N. Umeda, "Synthetic voices for computers", IEEE Spectrum,
     7, 22-45 (1970).  Automatic text-to-speech.  (B) Plastic
     demonstration diskette included (SSSHP 59).

     SSSHP 99 Tape: "SYNTHETIC VOICES FOR COMPUTERS, BTL, 1970.
          (99.9, 6 sen: "The North Wind and the Sun were ... the
          Sun was the stronger of the two.")
          (99.10, 3 sen:"This is the computer talking...
          synthesized speech. Good night, Folks.")
          Cassette copy of disk recording SSSHP 59, stylus noise
                 ****  need copy of master  ****

     SSSHP 82.9 Tape: "The Human Voice and the Computer, IEEE
          Soundings, Aug 1, 1971. Same as SSSHP 99.10, but for
          last three words.
          (syn, 2 sen: "This is the computer talking. It's been
          a pleasure communicating ... synthesized speech.")
          Commercial cassette, fair quality

1971 SSSHP 91.9 Tape: "MIT - Demo Tape 1, 10/90". Coker and Umeda,
          1971.
          (syn, 4 sen: "... am a computer. I can read stories ...
          information by telephone.")
          7" reel, 7.5 ips, good quality

1972 Demonstration at 1972 Int. Conf. of Speech Comm. and Proc. in
     Boston.

     SSSHP 32.25 Tape: Demo to accompany "Review of Text-to-speech
          conversion for English," D.H. Klatt, JASA 82.3, 9/87.
           (4 sen: "I can read stories ... by telephone.")
           Cassette, Klatt MIT A/D and D/A of SSSHP 91.9

19?? SSSHP 91.15 Tape: "MIT - DEMO TAPE 1, 10/90"
          (syn, 50 sec:  "Hello, this is a demonstration of speech
          synthesis from an articulatory model.  To generate
          synthetic speech, I start with ordinary spelling of
          English words.  I transform spelling to sound with a
          dictionary, together with rules for compound words and
          English prefixes and suffixes.  I create the sounds of
          speech ...  Thank you.")
          7" reel, 7.5 ips, good quality, undated Klatt tape


1973 Coker, C.H., N. Umeda, and C.P. Browman, "Automatic
     synthesis from ordinary English text", IEEE Trans. Audio and
     Electro. AU-21, 293-297 (1973). (K)  Similar version in (B)


1976 Umeda, N., "Linguistic rules for text-to-speech synthesis,"
     Proc. IEEE 64, 443-451 (1976). No tape recording.  (K)


1996 Coker, C.H., M.H. Krane, B.Y. Reis and R.A. Kubli, "Search
     for unexplored effects in speech production," Fourth 
     International Congress on Speech and Language Processing
     (ICSLP96), Paper FrP1S1.1, 1996. (Preprint in SSSHP USA BTL 
     file.) Sequential effects in excitation, aspiration during
     voicing and effects of air flow on acoustics of the vocal
     tract.


------------------------------------------------------------- Top
PROJECT: ARTICULATORY SYNTHESIS - MODEL 2


Simulated articulatory synthesizer.


1971 Mermelstein, P., "Calculation of the vocal-tract transfer
     function for speech synthesis applications", Proc. Seventh
     Intern. Congr. Acoust., Paper 23 C 13, 173-176 (1971).
     Computation of transfer function from shape of vocal tract.
     No tape recording.  (B)


1973 Mermelstein, P., "Articulatory model for the study of speech
     production," J. Acoust. Soc. Amer. 53, 1070-1082 (1973).
     (K)

     Tape ?   ("Why did Ken set the soggy net on top of his
           deck?", 8 /huh'CV/ utterances for identification test)


1974 Mermelstein, P., "Computer simulation of articulatory
     activity in speech production", Proc. Int. Joint Conf. on
     Artificial Intell., Washington, D.C., Gordon & Breach, New
     York 1974.  (I)

     Tape ?


------------------------------------------------------------- Top
PROJECT: TEXT TO SPEECH USING SYNTHETIC DYADS


Transitions from human speech stored as linear-prediction area
parameters of a synthetic dyad unit. "Steady state" portions are
obtained by connecting with straight lines the end points of
adjacent transitions. In some cases, only the transition end
points are stored with the actual transition being calculated by
interpolation.  Duration, pitch, and amplitude modified during
synthesis.  Large morpheme dictionary, letter-to-sound rules.


1974 Olive, J.P., and L.H. Nakatani, "Rule synthesis of speech by
     word concatenation: a first step", J. Acoust. Soc. Amer. 55,
     660-666 (1974).  (K)

     Tape ?


1976 Olive, J.P., and N. Spickenagle, "Speech resynthesis from
     phoneme-related parameters," J. Acoust. Soc. Amer 59,
     993-996 (1976).  (K)  does this belong here?


1977 Olive, J.P., "Rule synthesis of speech from diadic units",
     Proc. ICASSP-77, 568-570 (1977). Demonstrated at Epcot
     Center of Walt Disney World.  (K)

     Tape ?


1979 Olive, J.P., and M.Y. Liberman, "A set of concatenative
     units for speech synthesis," in SPEECH COMMUNICATION PAPERS
     PRESENTED AT THE 97TH MEETING OF THE ACOUSTICAL SOCIETY OF
     AMERICA, J.J. Wolf and D.H. Klatt Eds., Amer. Inst. of
     Physics, New York, 515-518 (1979). Uses consonant clusters
     as units when necessary.  (K)

     Tape?


Synthesis from phonetic input.  The synthesizer requires as its
input a string of phonemes and the associated duration, pitch and
amplitude parameters.  The synthesis scheme uses a large table of
stored transitions, dyads, between phonemes.


1980 Olive, J.P., "A scheme for concatenating units for speech
     synthesis," Proc. ICASSP-80, Denver CO, April, 568-571, 1980.

     SSSHP 91.16 Tape: "MIT - DEMO TAPE 1, 10/90"
          (syn, 1:08 min: "This paper describes a small real time
          speech synthesizer.  The synthesizer requires as its
          input, a string of phonemes and the associated duration,
          pitch and amplitude parameters.  The synthesis scheme
          uses a large table of stored transitions, dyads, between
          phonemes.  These transitions are stored ...  LPC derived
          area parameters ...  LSI 1123 microcomputer ...  digital
          signal processor.")
          7" reel, 7.5 ips, good quality, copy of undated Klatt
          tape
              ****  When was this tape created?  ****

     SSSHP 93.9 Tape: "MIT - DEMO TAPE 3, 10/90". Olive, 1985.(?)
          (syn, 1:57 min: "This paper describes a small real-time
          speech synthesizer.  The synthesizer requires as its
          inputs a string of phonemes and the associated duration,
          pitch, and amplitude parameters.  The synthesis scheme
          uses a large table of stored transitions, dyads, ...
          LSI 1123 microprocessor ...  Our original programs were
          written in C, and ran under the UNIX time-sharing system
          ...  LPC parameters.")
          7" reel, 7.5 ips, good quality, copy of Klatt tape
              ****  When was this tape created?  ****

     SSSHP 32.22 Tape: Demo to accompany "Review of Text-to-speech
          conversion for English," D.H. Klatt, JASA 82.3, 9/87.
          (syn: "This paper describes a small real-time speech
          synthesizer.  The synthesizer requires as its input a
          string of phonemes, and the associated duration, pitch,
          and amplitude parameters.  The synthesis scheme uses a
          large table of stored transitions, dyads, between
          phonemes.  These transitions are stored in terms of
          LPC-derived area parameters.")
          Cassette, Klatt MIT A/D and D/A of SSSHP 91.16 or 93.9


1983 Pols, L.C.W. and J.P. Olive, "Intelligibility of consonants
     in CVC utterances produced by diadic rule synthesis," Speech
     Communication 2, 3-13 (1983). CVC nonsense syllables to high
     school students, Olive 1977 LP segment concatenation scheme.
     (K)

     Tape?  (CVC test material)


Complete laboratory text-to-speech system using the Olive (1977)
diphone synthesis strategy in combination with a large morpheme
dictionary from C.H. Coker and letter-to-sound rules from K.
Church.  A third generation version.  Developmental real-time
text-to-speech board with 900 Kbytes storage.  (K)


1985 Olive, J.P., and M.Y. Liberman, "Text-to-speech - an
     overview," J. Acoust. Soc. Amer. 78 Suppl. 1, S6 (1985).


     Coker, C.H., "A dictionary-intensive letter-to-sound
     program," J. Acoust. Soc. Amer. 78 Suppl. 1, S7 (1985). A
     43,000 morpheme lexicon. (K)


     Church, K.W., "Stress assignment in letter-to-sound rules
     for speech synthesis," Proc. 23rd Meeting Assoc. Comp.
     Ling., 246-253 (1985).  (K)


     SSSHP 32.34 Tape: Demo to accompany "Review of text-to-speech
          conversion for English," D.H. Klatt, JASA 82.3, 9/87.
          (syn: "This paper will give a brief overview of recent
          text-to-speech work at Bell Laboratories.  Starting
          about a year ago, we have completed a new set of
          computer programs that translate English text into
          sound.  This system constructs speech sounds by
          concatenating elements from an inventory of about 900
          units, stored in terms of multipulse LPC coding.")
          Cassette, Klatt MIT A/D and D/A
                  ****  need copy of master  ****

     SSSHP 93.7 Tape: "MIT - DEMO TAPE 3, 10/90". Olive and
          Liberman, no date.
          (syn, 5 sen:  "I am a talking computer at Bell Labs.
          Many computers can talk.  However, most of them can only
          say words stored in their memory.  I have an unlimited
          vocabulary because I can read text, translate it into
          sound rules, and connect these sounds into sentences.
          Researchers are improving my speech by correcting
          mistakes that I still make and trying to make me sound
          better.")
          7" reel, 7.5 ips, good quality, copy of Klatt tape
                 ****  when was this tape created?  ****


1988 Improvement in pronounciation of foreign names and other
     languages (example of Mandrin Chinese.) Example of
     man/machine interaction using a speech recognition program.

     SSSHP 48 Tape: "Text-to-Speech Synthesis Using Dyads", M.Y.
          Liberman & J.P. Olive, 12-88"
          (syn, 2:11 min: "Hello, I am a system for real time
          translation of unrestricted text into speech, developed
          in the Information Principles Laboratory at AT&T Bell
          Labs.  My input is ordinary text, and ... can pronounce
          proper names of many nationalities correctly, for
          instance ...  easy for us to produce new voices,
          accents, or languages ...  a system for Mandrin Chinese
          that sounds like this ...  ...  This is the Bell
          Laboratories Flight Information System.  May I help
          you?" ... (dialog with user) ... available.")
          Cassette, good quality, copy of BTL tape
                 ****  use for master  ****


------------------------------------------------------------- Top
PROJECT: SYNTHESIS USING DEMISYLLABLE CONCATENATION


Assembly and modification of Linear Predictive Coded segments 
from human speech.


1977 Fujimura, O., M.J. Macchi, and J.B. Lovins, "Demisyllables
     and affixes for speech synthesis," Proc. of the 9th ICA,
     Madrid, Spain, July 4-9, Vol. 1, 513, 1977.

     Tape?


1978 Fujimura, O., and J. Lovins, "Syllables as concatenative
     phonetic elements," in SYLLABLES AND SEGMENTS, ed. by A.
     Bell and J.B. Hooper, North-Holland, New York, xx-xx (1978).
     (K)

     Tape?


1979 Lovins, J.B., M.J. Macchi, and O. Fujimura, "A demisyllable
     inventory for speech synthesis," Speech Communication
     Papers, 97th meeting of the Acoust.  Soc.  Amer, MIT,
     Cambridge MA, June 12-16, 519-522, 1979. 834 demisyllables,
     5 phonetic affixes, and 100 reduced-vowel units (CV or VC
     syllables) in LPC parameters. First attempts to make phrases
     and sentences.

     Tape?  (150 English monosyllable words, "The United States
         and other locations are divided into telephone areas,
         each of which is identified by a three-digit area code
         number")


1980 Browman, C.P., "Rules for demisyllable synthesis using
     Lingua, a language interpreter," Proc. ICASSP-80, 561-564
     (1980).  (K)

     SSSHP 93.8 Tape: "MIT - DEMO TAPE 3, 10/90"
          (syn, 10 sen, 1:57 min: "Hello, I am a language
          interpreter named Lingua.  I was created at Bell
          Laboratories by Cathy Browman, with a lot of help from a
          number of people ... type the sentences you want me to
          say into the computer, indicating how you want me to
          pronounce the sentences, and which words you want to
          emphasize ... linguistic research can be useful for all
          of us.")

     SSSHP 32.23 Tape: Demo to accompany "Review of text-to-speech
          conversion for English," D.H. Klatt, JASA 82.3, 9/87.
          (5 sen: "Hello, I am a language ... should be")
          Cassette, Klatt MIT A/D and D/A of SSSHP 93.8
	BTL Contents \| SSSHP Contents \| Labs \| Abbr. \| Index

Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use