NMAH | Smithsonian Speech Synthesis History Project (ss

ELECTROTECHNICAL LABORATORY (ETL)


Ministry of International Trade and Industry
1-1-4 Umezono, Sakura-mura, Niiharu-gun
Ibaragi 305, Japan


CONTENTS:

HISTORY

STAGE 1: STATIC VOWELS AND NASAL MURMURS (1963-1966)

STAGE 2: VOCAL TRACT SIMULATOR MODEL 1 (1965-1967)

STAGE 3: TEXT-TO-SPEECH IN ENGLISH, MODEL 2 (1967-1968)

STAGE 4: COMPUTER SIMULATION OF SYSTEM (1968-1969)

BIOGRAPHIES


------------------------------------------------------------- Top
HISTORY OF THE ORGANIZATION

ETL was founded by the Japanese government more than a hundred
years ago for fundamental research in electricity and electronics.
Its original charter directed it to provide leadership in the
research field, with researchers having great freedom to choose
their own objectives. The speech synthesis project was undertaken
because of its challenge and its usefulness to man-machine
communication technology.

The speech synthesis project was carried out by the Speech
Synthesis Group in the Acoustics Section headed by Eiichi Matsui,
who provided the group's research direction.  The supervisor of
the Speech Synthesis Group was Ryunen Teranishi.

The Speech Synthesis Group had two objectives:

   - Synthesis of continuous speech using a vocal tract
     analog simulator controlled by digital computer

   - Completely automatic English speech synthesis from
     English orthography

The project can be described in four stages. The speech
synthesizer used in each stage was:

Stage 1: Acrylic-made acoustic model of vocal tract (Plastic
         Vocal Tract Model), for getting elementary data for
         designing the later electronic vocal tract simulator

Stage 2: A composite type speech synthesis system composed of
         analog computer vocal tract simulator controlled by a
         digital computer

Stage 3: An improved version of Stage 2

Stage 4: Software simulation of vocal tract


------------------------------------------------------------- Top
STAGE 1: SYNTHESIS OF STATIC VOWELS AND NASAL MURMURS (1963-1966)


In this period, Teranishi and Umeda experimented with a plastic
model of the human vocal tract and nasal tract to obtain design
data for the later electronic simulation.

     Artifact: Plastic Vocal Tract Model (see ETL News photos),
	       in possession of R. Teranishi in 1989


1964 Teranishi, R., "Speech synthesis with an acoustic vocal
     tract model", ETL News, No. 171, April 1964. In Japanese.
     Experiments with the mechanical model and plans for the
     analog computer simulation. Photograph of plastic model.
     (SSSHP 25 reports)


1965 Teranishi, R., "Normalizing the voice production factors
     about the VTM (Vocal Tract Model)", ETL News, No. 181, Feb.
     1965. In Japanese. Photograph of plastic model. Model can
     produce male, female, or child's voice. (SSSHP 25 reports)


1966 Umeda, N., and R. Teranishi, "Phonemic feature and vocal
     feature - synthesis of speech sounds, using an acoustic
     model of vocal tract", J. Acoust. Soc. Japan, 22, 4, 195-203,
     1966.

     SSSHP 26.1 Diskette: "Speech Synthesis by Rule", Electro-
	  technical Laboratory, 1969.
	  (Japanese vowels "a, i, u, e, o", various voices)
	  Plastic diskette, 33 1/3 rpm

     SSSHP 39.1 Tape: (R. Teranishi's copy of SSSHP 26 Diskette)
	  Cassette, good quality,

     SSSHP 69 Tape: "Tape 3, Speech sounds of the ETL acoustic
	  model in 1965", copied from master by Hiroshi Omura,
	  ETL, June 27, 1989.
	  (Japanese vowels "a, i, u, e, o", male, female, child;
	  nasalized vowels)
	  Cassette, good quality, use for master



------------------------------------------------------------- Top
STAGE 2: SYNTHESIS OF CONTINUOUS JAPANESE SPEECH BY VOCAL TRACT
         SIMULATOR MODEL 1 (1965 - 1967)


The first synthesizer was an analog computer simulation of a
17-section vocal tract and fixed nasal tract.  It was simulated
as a ladder network of inverse L type LC units.  Analog
multipliers were controlled every five milliseconds from digital
code from a tape separately prepared by an IBM 7090 computer.
The Hitachi-built synthesizer used 71 operational amplifiers and
22 multipliers.  The multipliers were fixed-resistor networks
switched by photoconductor/neon-lamp pairs, controlled from the
digital tape. (For Hitachi speech synthesis, see SSSHP JAPAN
Hitachi file.)

With computer control, the synthesizer could produce arbitrary
sound sequences, in contrast to the plastic vocal tract model
which was limited to producing only sustained sounds. The
synthesizer was designed by Matsui and Teranishi, and the FORTRAN
control program of the model was designed by Matsui, Suzuki,
Umeda, and Omura.  The linguistic, phonetic, and articulatory
specifications were done by Matsui, Suzuki, Umeda and Omura.


1965 Teranishi, R., "Dynamic analog speech synthesizer", ETL
     News, No. 189, Oct. 1965.	In Japanese.  Produces speech
     sequencies and sounds more natural than plastic model.
     Photograph of synthesizer, waveforms of vowel.  (SSSHP 25
     reports)


1966 Matsui, E., "Speech synthesizer controlled by computer",
     ETL News, No. 197, June 1966. In Japanese. First results.
     Photograph of synthesizer, synthesizer circuit diagram, list
     of control data, list of sample words, problems to be
     solved. (SSSHP 25 reports)

     SSSHP 67 Tape: "Tape 1, Synthesized speech reported in the
	  1966 ETL News #197", copied from master by Hiroshi
	  Ohmura, ETL, June 27, 1989.
	  (Japanese vowels, 15 CV's, 4 words, 9 phrases)
	  Cassette, good quality, samples from paper


1967 "Robot Raconteur", Electronics, Feb. 6, 1967. Photograph.

     SSSHP 26.2-3 Diskette: "Speech Synthesis by Rule", Electro-
	  technical Laboratory, 1969.
	  (Japanese story, The Peach Boy (Momotaro no ohanashi),
	  several sentences, Japanese "tongue twisters")
	  Plastic diskette, 33 1/3 rpm

     SSSHP 39 Tape: (R. Teranishi's copy of SSSHP 26 Diskette)
	  Cassette, good quality, use instead of diskette

     SSSHP 71 Tape: "Tape 5, Japanese 'tongue twisters' copied
	  from the original tape", copied from master by Hiroshi
	  Ohmura, ETL, June 27, 1989.
	  (four phrases, three tempos each: "tonari no kyaku ...
	  tokkyo kyokakyoku")
	  Cassette, good quality, *** use for master ***


------------------------------------------------------------- Top
STAGE 3. TEXT-TO-SPEECH IN ENGLISH USING VOCAL TRACT SIMULATOR
         MODEL 2 (1967-1968)


First demonstrated text-to-speech system for English.  Some of
the rules were later used in Bell Laboratories text-to-speech
system (K).  (see SSSHP USA Bell Telephone Laboratories file).

This second model was an improved synthesizer, also built by
Hitachi, using high speed analog multipliers to eliminate the
moisture sensitivity of the Model 1's photoconductors and to
reduce noise.  A higher-frequency spectrum allowed simulation of
female or child's voice.  A pi-network, rather than the earlier
inverse-L network, was used, allowing exact simulation mid-vocal
tract closures.  Control data, changed every 5 milliseconds, fed
D/A converters to supply control voltages for the analog
multipliers.  A dictionary with 1500-word vocabulary was
sufficient for children's fairy tales.  About 30 computer runs
were made in preparing the following demonstration tape.

The synthesizer was designed by Matsui, Suzuki, and Omura, and
the control program by Matsui, Teranishi, Suzuki, and Omura.  The
linguistic, phonetic and articulatory specifications were by
Teranishi and Umeda, who also did the dictionary and parser.


1968 Teranishi, R., "Read aloud English story", ETL News, No.
     222, July 1968. In Japanese. Project of speech synthesis by
     hardware vocal tract model is finished.  Photograph of Model
     II, discussion of the pronouncing dictionary and the
     syntactic analysis, prospects for speech research, listing
     of sample input sentences and intermediate string of
     phonetic symbols (in alphanumeric computer characters).
     Sonagram of "Once upon a time".  (SSSHP 25 Reports)

     Program Listing:  PL/1 source code for converting input
     sentences into control signal sequences for ETL's analog
     vocal tract simulator Model 2.  (SSSHP 36)

     Program Listing:  Sample input and output for PL/1 program.
     (SSSHP 37)


1968 Teranishi,R., Umeda,N., "Use of pronouncing dictionary in
     speech synthesis experiments", Proc. Sixth Intern. Congr.
     Acoust., Tokyo, Japan, Aug. 1968, Paper B-5-2, B155-B158.
     (B,K)


1968 Matsui,E., Suzuki,T., Umeda,N., Omura,H., "Synthesis of
     fairy tales using an analog vocal tract", Proc. Sixth
     Intern. Congr. Acoust., Tokyo, Japan, Aug. 1968, Paper
     B-5-3, B159-B162. (K)


     Reprint:  "Grimm's Fairy Tales to Read Aloud", Wonder Books,
     Inc., 1963.  Pages 122 and 123 synthesized.  (SSSHP 27)


1968 Umeda, N., and R. Teranishi, "The parsing program for
     automatic text-to-speech synthesis developed at the Electro-
     technical Laboratory in 1968,", IEEE Trans. ASSP-23, 183-188
     (1975). (K)

     SSSHP 70 Tape: "Tape 4, 'Sleeping Beauty' copied from the
	  original tape", copied from master by Hiroshi Ohmura,
	  ETL, June 27, 1989.
	  (English, male voice, 12 sen: Sleeping Beauty,"Once
	  upon a time ...  King and Queen ...  and fall down
	  dead."; child voice, 4 sen: "Once upon ... a daughter")
	  Cassette, good quality, *** use for master ***

     SSSHP 26.4 Diskette: "Speech Synthesis by Rule", Electro-
	  technical Laboratory, 1969.
	  (English, male voice, 4 sen: Sleeping Beauty, "Once
	  upon a time...King and Queen...have a child - a
	  daughter")
	  Plastic diskette, 33 1/3 rpm

     SSSHP 39 Tape: (R. Teranishi's tape of SSSHP 26 Diskette)
	  Cassette, better quality than diskette

     SSSHP 38 Tape: "Tape 4, 'Sleeping Beauty, synthesized by
	  rule, Electrotechnical Laboratory, Tokyo, August,
	  1968", copy of copy, R. Teranishi 3/2/89.
	  (English, male voice, 12 sen: Sleeping Beauty,"Once
	  upon a time ...  King and Queen ...  and fall down
	  dead."; child voice, 4 sen: "Once upon ... a daughter")
	  3" reel, good quality

     SSSHP 32.24 Tape: "Text-to-Speech History, D. Klatt, ASA
	  Demo, Copy 2, 2/87". Demo to accompany "Review of
	  Text-to-speech conversion for English," D.H. Klatt,
	  JASA 82.3, Sept. 1987.
	  (English, male voice, 3 sen: Sleeping Beauty, "Once
	  upon a time ... King and Queen ... your wish shall be
	  fulfilled")
	  Cassette, good quality, Klatt MIT A/D and D/A

     SSSHP 137 Tape: "Sleeping Beauty", E. Matsui, et.al., ETL, 
          Aug 1968. Another copy.



------------------------------------------------------------- Top
STAGE 4: COMPUTER SIMULATION OF VOCAL ORGAN SYSTEM (1968-1969)


A software-only simulation by Matsui that assembled "sound
elements" synchronized to pitch periods.  Each sound element was
a waveform of the impulse response of the simulated vocal tract,
computed by using the Fourier Transform.  Implemented in PL/I
language on an IBM S/360 Model 75 computer; took about 20 times
real time.

1968 Matsui,E., "Computer-simulated vocal organs", Proc. 6th
     Int. Congr. Acoust., Tokyo, Japan, August 1968, Paper B-5-1,
     B151-B154. (I)

1968 "Speech Researches in the Electrotechnical Laboratory",
     handout at 6th ICA, Tokyo, Japan, 1968. (SSSHP 14)

     SSSHP 68 Tape: "Tape 2, Dr. Matsui's demo in 1968", copied
	  from master by Hiroshi Ohmura, ETL, June 27, 1989.
	  (syn, English:"Computer simulated vocal organs.")
	  Cassette, good quality, tape was played at ICA


------------------------------------------------------------- Top
BIOGRAPHIES


EIICHI MATSUI

1944 Graduated, Tohoku Univ., Sendai
1951 Research assistant, Tohoku Univ., Sendai
1953 ETL Japan (except 1962-63 NBS Washington, D.C., USA)
1960 Ph.D. in Electroacoustics, Tohoku Univ., Sendai
1962-63 National Bureau of Standards, Washington, D.C., USA
1972 Professor, Sizuoka Univ.
1984 Professor, Fukui Institute of Technology


HIROSHI OMURA

1963 B.A. in Elect. Engineering, Nippon University
1965 M.A. in Elect. Engineering, Nippon University
1965 ETL, Japan


TORAZO SUZUKI

1949 B.A. in Electrical Engineering, Mumashi Inst. of Technology
1949 ETL, Japan
1988 retired from ETL


RYUNEN TERANISHI

1950 B.A. and M.A. in Psychology, The Univ. of Tokyo
     Research assistant, Hokkaido Univ., Sapporo
1958 ETL Japan (except 1965-66 P.D. Fellow, NRC Ottawa)
1962 Ph.D. in Psychology, The Univ. of Tokyo
1965-66 P.D. Fellow, National Research Council of Canada, Ottawa
1973 Professor, Kyushu Inst.of Design, Fukuoka
1975 Biography, photo in IEEE Trans ASSP, April 1975
1988 Fellow, Acoust. Soc. of America


NORIKO UMEDA

1957 BA in Linguistics, Univ of Tokyo
1959 MA in Linguistics, Univ of Tokyo
1962 Ph.D. in Linguistics, Univ of Tokyo
     ETL Japan
1969 Bell Labs, Murray Hill, NJ
1975 biography, photo in IEEE Trans. ASSP, April 1975
1983 Prof. and Chairman, Dept. of Linguistics, 719 Broadway,
     New York University, New York, NY 10003
1987 Director, Institute for Speech and Language Sciences,
     New York University.


------------------------------------------------------------- Top
CONTRIBUTIONS AND REVIEW BY:

Dr. Takayuki Nakajima, Director
Machine Understanding Division
Electrotechnical Laboratory
1-1-4 Umezono
Tsukuba Science City 305, Japan

Prof. Ryunen Teranishi
Kyushu Institute of Design
4-9-1 Shiobaru, Minami-ku
Fukuoka-shi, 815 Japan

(History details were obtained from T. Nakajima and R. Teranishi,
in personal communications to H.D. Maxey in 1988/89. See SSSHP 
JAPAN Electrotechnical Lab. file.)
	SSSHP Contents \| Labs \| Abbr. \| Index

Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use