NMAH | Smithsonian Speech Synthesis History Project (ss

NIPPON TELEGRAPH AND TELEPHONE CORPORATION (NTT)


Electrical Communications Laboratories
Musashino-shi, Tokyo
 
 
CONTENTS:

HISTORY

SPEECH ANALYSIS/SYNTHESIS:

   MAXIMUM LIKELIHOOD (1966-1971)

   PARTIAL AUTOCORRELATION (PARCOR) (1968-1979)

   LINE SPECTRUM PAIR (LSP) (1976-1981)

SPEECH SYNTHESIS BY RULE:

   TERMINAL ANALOG - VCV SYNTHESIS (1966-1969)

   PARCOR - VCV SYNTHESIS (1974-1977)

   TEXT-TO-SPEECH, LSP/CV SYNTHESIS (1981-1984)

   TEXT-TO-SPEECH, LSP/CVC SYNTHESIS (1981-1984)

BIOGRAPHIES


------------------------------------------------------------- Top
HISTORY:  DIGITAL SPEECH PROCESSING, SYNTHESIS, AND RECOGNITION,
          Sadaoki Furui, Marcel Dekker, Inc., New York, p. 408,
          1989.

"Speech synthesis research and development at NTT Laboratories
has a history of more than 20 years. The technological details of
this research are explained in (the above book) . . . The speech
synthesis activities performed at the NTT laboratories, as
described in the book, can be divided into two fields, speech
analysis/synthesis and speech synthesis by rule.


------------------------------------------------------------- Top
SPEECH ANALYSIS/SYNTHESIS


Speech analysis/synthesis is the technology of efficiently
extracting important features from speech waves and precisely
reproducing original speech sounds using these features. The
maximum likelihood method, the PARCOR method, and the LSP method,
all of which are known generally as linear predictive coding
(LPC) methods, have achieved speech transmission and storage
using small amounts of information, such as 1/100 to 1/300 of
that necessary for simply representing speech waves. The maximum
likelihood method broke new ground in the area of LPC analysis/
synthesis, and was developed into the PARCOR method.  The LSP
method represents speech waves most efficiently.  The PARCOR and
LSP methods are widely used in equipment for transmitting speech
waves at very low bit rates such as 2.4 - 4.8 kbit/s.  Speech
synthesis devices based on the principles of these methods have
been created using Large Scale Integration, and are widely used
in voice guidance services, voice response equipment, toys, and
so on.  Speech analysis devices also based on these principles
are widely used for feature extraction in speech recognition
systems.  The LPC methods have had a very large impact on every
aspect of speech research throughout the world."  (S. Furui,
personal communication to H.D. Maxey, 1989, in SSSHP Japan NTT
file.)

The following three projects investigated techniques for
analyzing and reconstructing human speech by variations of Linear
Predictive Coding (LPC).  If the intermediate parameters are
not modified, these are simply a class of Vocoders for
reduced-data transmission of human speech.  They are included in
this history because of the later use for speech synthesis by
rule.


------------------------------------------------------------- Top
PROJECT: MAXIMUM LIKELIHOOD METHOD (1966 - 1971)


1968 Itakura,F., S. Saito, "Analysis synthesis telephony based on
     the maximum likelihood method", Proc. 6th Intern. Congr.
     Acoust. C-5-5, Tokyo, Japan, August 1968.  (B,K) (SSSHP 28
     reprints)

     SSSHP 44 Tape: "NTT-3, Sadaoki Furui (NTT), Feb 15, 1989"
          (English, 10 sen, American male/Japanese female: "Hello
           how are you? ... accomplishment"; original, vocoder,
           time-expanded vocoder)
           7" reel, good quality


1970 Itakura,F., S. Saito, "A statistical method for estimation
     of speech spectral density and formant frequencies", Trans.
     Electronics and Communications in Japan 53A, 36-43 (1970).
     (B) (SSSHP 28 reprints)


1971 Itakura, F., S. Saito, "Speech information compression based
     on the maximum likelihood spectral estimation", J. Acoust.
     Soc. Japan, 27, 463-472 (1971). (SSSHP 28 reprints)


------------------------------------------------------------- Top
PROJECT: PARTIAL AUTOCORRELATION (PARCOR) METHOD (1968 - 1979)


1971 Itakura, F., S. Saito, "Digital filtering techniques for
     speech analysis and synthesis," Proc. 7th Intern. Congr.
     Acoust., Budapest, Hungary, (1971).  (SSSHP 28 reprints)


1972 Itakura,F., S. Saito, "On the optimum quantization of feature
     parameters in the PARCOR speech synthesizer", Proc. 1972
     Conf. Speech Commun. Process.  434-437 (1972).  (B) (SSSHP
     28 reprints)


1972 Itakura, F., S. Saito, T. Koike, H. Sawabe, M. Nishikawa,
     "An audio response unit based on partial autocorrelation,"
     IEEE Trans. on Commun., COM-20, 792-797 (1972). (SSSHP 28
     reprints)


1978 Tohkura, Y., F. Itakura, S. Hashimoto, "Spectral smoothing
     technique in PARCOR speech analysis-synthesis," IEEE Trans.
     on ASSP, ASSP-26, 587-596 (1978). (SSSHP 28 reprints)

     SSSHP 29.1 Tape: "NTT-1, NTT Human Interface Laboratories,
          Sadaoki Furui, Jan 13, 1989", English sentences by
          PARCOR, three data rates each.
          (9 sen, male and fem syn:  "Isn't it a fine day?  ...
          at ten o'clock.")
           7" reel, good quality


1979 Tohkura, Y., F. Itakura, "Spectral sensitivity analysis of
     PARCOR parameters for speech data compression," IEEE Trans.
     on ASSP, ASSP-27, 273-280 (1979). (SSSHP 28 reprints)


------------------------------------------------------------- Top
PROJECT: LINE SPECTRUM PAIR (LSP) METHOD (1976 - 1981)


1981 Sugamura, N., F. Itakura, "Speech data compression by LSP
     speech analysis-synthesis technique", Trans. Electronics and
     Communications in Japan, 64A, 599-606 (1981). (SSSHP 28
     reprints)

     SSSHP 29.2 Tape: NTT-1, NTT Human Interface Laboratories,
          Sadaoki Furui, Jan 13, 1989.  English sentences by
          PARCOR and LSP, analog and three data rates each, male
          and female voices.
          (4 sen: "The ship was torn ... like fast music")
           7" reel, good quality


------------------------------------------------------------- Top
SPEECH SYNTHESIS BY RULE (1966 -    )


"Speech synthesis by rule is the technology of automatically
converting any kind of written sentence into speech using
computers.  It is achieved through the combination of highly
sophisticated linguistic analysis techniques and high quality
speech production techniques.  The latter were created by conca-
tenating VCV (vowel-consonant-vowel), CV (consonant-vowel), or
CVC (consonant-vowel-consonant) units produced by the terminal
analog, the PARCOR, or the LSP method.  Highly intelligible and
comprehensible voices, which are similar to human voices, have
been synthesized by these methods.  These methods are now used in
various kinds of commercial systems developed by NTT, such as the
'ANSER' banking service systems using speech recognition and
speech synthesis technologies, the `Voice-Twin' text-to-speech
conversion system for newspaper proofreading, and the
'Petit-ANSER' compact voice response equipment."  (S. Furui,
personal communication to H.D. Maxey, 1989)


------------------------------------------------------------- Top
PROJECT: TERMINAL ANALOG - VCV SYNTHESIS (1966 - 1969)


Simulation of terminal analog synthesizer (formant-type) with
on-line computer. Synthesis based on Vowel-Consonant-Vowel
segments.

1968 Saito, S., S. Hashimoto, "Speech synthesis system based
     on interphoneme transition unit", Proc. 6th Intern. Congr.
     Acoust. B-5-12, Tokyo, Japan, August 1968. (I) (SSSHP 28
     reprints)

     SSSHP 30.1 Tape: "NTT-2, NTT Human Interface Laboratories,
          Hirokazu Sato & Sadaoki Furui, Dec 27, 1988"
          (Japanese words: "sakura, kaigi, saiban, amai, issai",
          4 Japanese sen: "Karewa sakanawo ... namiga tachimasu.")
           7" reel, good quality


------------------------------------------------------------- Top
PROJECT: PARCOR - VCV SYNTHESIS (1974 - 1977)


1978 Sato, H., "Speech synthesis on the basis of PARCOR-VCV
     concatenation units", Trans. Electronics and Communications
     in Japan, 61D, 858-865 (1978). (SSSHP 28 reprints)


1978 Sato, H., K. Hakoda, "Speech synthesis based on stored
     speech segments and rules", Electrical Communications
     Laboratories R&D Report 27, 2251-2566 (1978). (SSSHP 28
     reprints)

     SSSHP 30.2 Tape: "NTT-2, NTT Human Interface Laboratories,
          Hirokazu Sato & Sadaoki Furui, Dec 27, 1988"
          (Female Japanese sen: "Tsubamega tonda. ... yamani
           noborimashita")
           7" reel, good quality


------------------------------------------------------------- Top
PROJECT: TEXT-TO-SPEECH CONVERSION, LSP/CV SYNTHESIS (1981-1984)


1983 Sato, H., Y. Sagisaka, K. Kogure, S. Sagayama, "Unrestricted
     Japanese text conversion to speech", Electrical
     Communications Laboratories R & D Report 32, 2243-2252
     (1983). (SSSHP 28 reprints)


1984 Sato, H., "Japanese text-to-speech conversion system",
     Review of Electrical Communication Laboratories 32, 179-187
     (1984). (SSSHP 28 reprints)

     SSSHP 61 Tape: "NTT-4, Text-to-speech conversion, LSP/CV
          synthesis", S. Furui, Apr. 12, 1989"
          (2 female Japanese sen: "Jishinnoyoona ...
           fukanoodearu")
          7" reel, good quality


------------------------------------------------------------- Top
PROJECT: TEXT-TO-SPEECH CONVERSION, LSP/CVC SYNTHESIS (1981-1984)


1984 Sato, H., "Speech synthesis using CVC concatenation units
     and excitation waveform elements", Trans. Committee on
     Speech Research, Acoust. Soc. of Japan, S83-69, 541-546
     (1984). (SSSHP 28 reprints)


1986 Y. Sagisaka and H. Sato, "Word identification method for
     Japanese text to speech conversion system, ICASSP86, Tokyo,
     paper 45.3.1, pp. 2411-2414, 1986


1986 Sagisaka, Y., H. Sato, "Composite phoneme units for the
     speech synthesis of Japanese", Speech Communication 5,
     217-223 (1986). (SSSHP 28 reprints)

     SSSHP 30.3 Tape: "NTT-2, NTT Human Interface Laboratories,
          Hirokazu Sato & Sadaoki Furui, Dec 27, 1988"
          (4 Japanese sen: "Shinkansenno tabiwa ... yochiwa
           genritekini fukanoudearu.")
           7" reel, good quality


------------------------------------------------------------- Top
BIOGRAPHIES


SHIN'ICHIRO HASHIMOTO

1958 B.E. in Electrical Engineering, Keio Univ., Tokyo
1958 NTT Musashino Electrical Communication Laboratory
1966-70 speech analysis, synthesis, recognition, transmission in
     NTT Research Division
1969 Dr.Eng. in EE, Keio Univ., Tokyo
1975 Chief, Fourth Research Section (speech and image proc, char-
     acter recognition)
1986 Dir. of Engineering
     Secom Intelligent Systems Lab.
     6-11-23 Shimorenjaku Mitaka-shi
     Tokyo 181 Japan


FUMITADA ITAKURA

1963 B.E.E., Nagoya Univ., Nagoya
1965 M.E.E., Nagoya Univ., Nagoya, research on speech proc.
1968 NTT Musashino Electrical Communication Laboratory (speech
     signal processing)
1972 Dr. of Engineering, Nagoya Univ., Nagoya (thesis: "Speech
     analysis and synthesis based on a statistical method")
1973-75 Bell Labs, Murray Hill, NJ (speech recognition)
1976 NTT Staff Engineer (speech analysis/synthesis)
1981 Chief of Fourth Research Section (speech and acoustics
     research)
1984 Nagoya University professor (comm. theory/signal proc.)
     Dept of Electronics Engineering
     Furoucho Chikusa-ku
     Nagoya 464 Japan


SHUZO SAITO

1948 B.E.E., Nagoya Univ., Nagoya
1953 NTT Musashino Electrical Communication Laboratory (assessment
     of speech quality, speech signal processing)
1962 Dr. of Engineering., Nagoya Univ., Nagoya (thesis:
     "Fundamental research on transmission quality of Japanese
     phonemes")
1966 Chief of Fourth Research Section (speech and image proc.,
     character recognition)
1979 Tokyo Univ. professor (speech science)
1984 Kogakuin Univ. professor (speech processing)
     Dept. of Electronics Engineering   
     1-24-2 Nishi-shinjuku Shinjuku-ku
     Tokyo 160 Japan
1986 ICASSP86 Technical Program Chairman


HIROKAZU SATO

1967 B.E.E., Hokkaido Univ., Sapporo
1969 M.E.E., Hokkaido Univ., Sapporo
     NTT Musashino Electrical Communication Laboratory (speech
     synthesis by rule)
1978 NTT Yokosuka Electrical Communication Laboratory
     (development of audio response equipment)
1981 NTT Musashino Electrical Communication Laboratory (Japanese
     text-to-speech conversion)
1985 Dr. of Engineering, Hokkaido Univ., Sapporo (thesis:
     "Research on speech synthesis by rule")
1987 Senior Research Engineer, Supervisor, NTT Human Interface
     Laboratories


NOBORU SUGAMURA

1974 B.E.E., Osaka Univ., Osaka
1976 M.E.E., Osaka Univ., Osaka
     NTT Musashino Electrical Communication Laboratory (speech
     analysis/synthesis, speech recognition)
1985 Dr. of Engineering, Osaka Univ., Osaka (thesis: "Speech
     signal coding using line spectrum parameters")
1986-87 Univ. of Maryland, MD (speech analysis/synthesis)
1989 Senior Research Engineer, NTT Human Interface Laboratories


YOH'ICHI TOHKURA

1970 B.E.E., Tokyo Univ., Tokyo
1972 M.E.E., Tokyo Univ., Tokyo
1972 NTT Musashino Electrical Communication Laboratory (speech
     analysis/synthesis, speech perception)
1980 Dr. of Engineering, Tokyo Univ., Tokyo (thesis: "Speech
     quality improvement in PARCOR analysis/synthesis system")
1984-85 Bell Telephone Laboratories, Murray Hill, NJ (speech
     recognition)
1986 Head, Hearing and Speech Perception Department,
     ATR Auditory & Visual Perception Research Laboratories
     Twin 21 MID Tower
     2-1-61 Shiromi Higashi-ku
     Osaka 540 Japan


------------------------------------------------------------- Top
CONTRIBUTIONS AND REVIEW BY:

Dr. Sadaoki Furui
Speech Information Laboratory
NTT Human Interface Laboratories
NTT Corporation
	SSSHP Contents \| Labs \| Abbr. \| Index

Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use