NIPPON TELEGRAPH AND TELEPHONE CORPORATION (NTT)
Electrical Communications Laboratories
Musashino-shi, Tokyo
CONTENTS:
HISTORY
SPEECH ANALYSIS/SYNTHESIS:
MAXIMUM LIKELIHOOD (1966-1971)
PARTIAL AUTOCORRELATION (PARCOR) (1968-1979)
LINE SPECTRUM PAIR (LSP) (1976-1981)
SPEECH SYNTHESIS BY RULE:
TERMINAL ANALOG - VCV SYNTHESIS (1966-1969)
PARCOR - VCV SYNTHESIS (1974-1977)
TEXT-TO-SPEECH, LSP/CV SYNTHESIS (1981-1984)
TEXT-TO-SPEECH, LSP/CVC SYNTHESIS (1981-1984)
BIOGRAPHIES
------------------------------------------------------------- Top
HISTORY: DIGITAL SPEECH PROCESSING, SYNTHESIS, AND RECOGNITION,
Sadaoki Furui, Marcel Dekker, Inc., New York, p. 408,
1989.
"Speech synthesis research and development at NTT Laboratories
has a history of more than 20 years. The technological details of
this research are explained in (the above book) . . . The speech
synthesis activities performed at the NTT laboratories, as
described in the book, can be divided into two fields, speech
analysis/synthesis and speech synthesis by rule.
------------------------------------------------------------- Top
SPEECH ANALYSIS/SYNTHESIS
Speech analysis/synthesis is the technology of efficiently
extracting important features from speech waves and precisely
reproducing original speech sounds using these features. The
maximum likelihood method, the PARCOR method, and the LSP method,
all of which are known generally as linear predictive coding
(LPC) methods, have achieved speech transmission and storage
using small amounts of information, such as 1/100 to 1/300 of
that necessary for simply representing speech waves. The maximum
likelihood method broke new ground in the area of LPC analysis/
synthesis, and was developed into the PARCOR method. The LSP
method represents speech waves most efficiently. The PARCOR and
LSP methods are widely used in equipment for transmitting speech
waves at very low bit rates such as 2.4 - 4.8 kbit/s. Speech
synthesis devices based on the principles of these methods have
been created using Large Scale Integration, and are widely used
in voice guidance services, voice response equipment, toys, and
so on. Speech analysis devices also based on these principles
are widely used for feature extraction in speech recognition
systems. The LPC methods have had a very large impact on every
aspect of speech research throughout the world." (S. Furui,
personal communication to H.D. Maxey, 1989, in SSSHP Japan NTT
file.)
The following three projects investigated techniques for
analyzing and reconstructing human speech by variations of Linear
Predictive Coding (LPC). If the intermediate parameters are
not modified, these are simply a class of Vocoders for
reduced-data transmission of human speech. They are included in
this history because of the later use for speech synthesis by
rule.
------------------------------------------------------------- Top
PROJECT: MAXIMUM LIKELIHOOD METHOD (1966 - 1971)
1968 Itakura,F., S. Saito, "Analysis synthesis telephony based on
the maximum likelihood method", Proc. 6th Intern. Congr.
Acoust. C-5-5, Tokyo, Japan, August 1968. (B,K) (SSSHP 28
reprints)
SSSHP 44 Tape: "NTT-3, Sadaoki Furui (NTT), Feb 15, 1989"
(English, 10 sen, American male/Japanese female: "Hello
how are you? ... accomplishment"; original, vocoder,
time-expanded vocoder)
7" reel, good quality
1970 Itakura,F., S. Saito, "A statistical method for estimation
of speech spectral density and formant frequencies", Trans.
Electronics and Communications in Japan 53A, 36-43 (1970).
(B) (SSSHP 28 reprints)
1971 Itakura, F., S. Saito, "Speech information compression based
on the maximum likelihood spectral estimation", J. Acoust.
Soc. Japan, 27, 463-472 (1971). (SSSHP 28 reprints)
------------------------------------------------------------- Top
PROJECT: PARTIAL AUTOCORRELATION (PARCOR) METHOD (1968 - 1979)
1971 Itakura, F., S. Saito, "Digital filtering techniques for
speech analysis and synthesis," Proc. 7th Intern. Congr.
Acoust., Budapest, Hungary, (1971). (SSSHP 28 reprints)
1972 Itakura,F., S. Saito, "On the optimum quantization of feature
parameters in the PARCOR speech synthesizer", Proc. 1972
Conf. Speech Commun. Process. 434-437 (1972). (B) (SSSHP
28 reprints)
1972 Itakura, F., S. Saito, T. Koike, H. Sawabe, M. Nishikawa,
"An audio response unit based on partial autocorrelation,"
IEEE Trans. on Commun., COM-20, 792-797 (1972). (SSSHP 28
reprints)
1978 Tohkura, Y., F. Itakura, S. Hashimoto, "Spectral smoothing
technique in PARCOR speech analysis-synthesis," IEEE Trans.
on ASSP, ASSP-26, 587-596 (1978). (SSSHP 28 reprints)
SSSHP 29.1 Tape: "NTT-1, NTT Human Interface Laboratories,
Sadaoki Furui, Jan 13, 1989", English sentences by
PARCOR, three data rates each.
(9 sen, male and fem syn: "Isn't it a fine day? ...
at ten o'clock.")
7" reel, good quality
1979 Tohkura, Y., F. Itakura, "Spectral sensitivity analysis of
PARCOR parameters for speech data compression," IEEE Trans.
on ASSP, ASSP-27, 273-280 (1979). (SSSHP 28 reprints)
------------------------------------------------------------- Top
PROJECT: LINE SPECTRUM PAIR (LSP) METHOD (1976 - 1981)
1981 Sugamura, N., F. Itakura, "Speech data compression by LSP
speech analysis-synthesis technique", Trans. Electronics and
Communications in Japan, 64A, 599-606 (1981). (SSSHP 28
reprints)
SSSHP 29.2 Tape: NTT-1, NTT Human Interface Laboratories,
Sadaoki Furui, Jan 13, 1989. English sentences by
PARCOR and LSP, analog and three data rates each, male
and female voices.
(4 sen: "The ship was torn ... like fast music")
7" reel, good quality
------------------------------------------------------------- Top
SPEECH SYNTHESIS BY RULE (1966 - )
"Speech synthesis by rule is the technology of automatically
converting any kind of written sentence into speech using
computers. It is achieved through the combination of highly
sophisticated linguistic analysis techniques and high quality
speech production techniques. The latter were created by conca-
tenating VCV (vowel-consonant-vowel), CV (consonant-vowel), or
CVC (consonant-vowel-consonant) units produced by the terminal
analog, the PARCOR, or the LSP method. Highly intelligible and
comprehensible voices, which are similar to human voices, have
been synthesized by these methods. These methods are now used in
various kinds of commercial systems developed by NTT, such as the
'ANSER' banking service systems using speech recognition and
speech synthesis technologies, the `Voice-Twin' text-to-speech
conversion system for newspaper proofreading, and the
'Petit-ANSER' compact voice response equipment." (S. Furui,
personal communication to H.D. Maxey, 1989)
------------------------------------------------------------- Top
PROJECT: TERMINAL ANALOG - VCV SYNTHESIS (1966 - 1969)
Simulation of terminal analog synthesizer (formant-type) with
on-line computer. Synthesis based on Vowel-Consonant-Vowel
segments.
1968 Saito, S., S. Hashimoto, "Speech synthesis system based
on interphoneme transition unit", Proc. 6th Intern. Congr.
Acoust. B-5-12, Tokyo, Japan, August 1968. (I) (SSSHP 28
reprints)
SSSHP 30.1 Tape: "NTT-2, NTT Human Interface Laboratories,
Hirokazu Sato & Sadaoki Furui, Dec 27, 1988"
(Japanese words: "sakura, kaigi, saiban, amai, issai",
4 Japanese sen: "Karewa sakanawo ... namiga tachimasu.")
7" reel, good quality
------------------------------------------------------------- Top
PROJECT: PARCOR - VCV SYNTHESIS (1974 - 1977)
1978 Sato, H., "Speech synthesis on the basis of PARCOR-VCV
concatenation units", Trans. Electronics and Communications
in Japan, 61D, 858-865 (1978). (SSSHP 28 reprints)
1978 Sato, H., K. Hakoda, "Speech synthesis based on stored
speech segments and rules", Electrical Communications
Laboratories R&D Report 27, 2251-2566 (1978). (SSSHP 28
reprints)
SSSHP 30.2 Tape: "NTT-2, NTT Human Interface Laboratories,
Hirokazu Sato & Sadaoki Furui, Dec 27, 1988"
(Female Japanese sen: "Tsubamega tonda. ... yamani
noborimashita")
7" reel, good quality
------------------------------------------------------------- Top
PROJECT: TEXT-TO-SPEECH CONVERSION, LSP/CV SYNTHESIS (1981-1984)
1983 Sato, H., Y. Sagisaka, K. Kogure, S. Sagayama, "Unrestricted
Japanese text conversion to speech", Electrical
Communications Laboratories R & D Report 32, 2243-2252
(1983). (SSSHP 28 reprints)
1984 Sato, H., "Japanese text-to-speech conversion system",
Review of Electrical Communication Laboratories 32, 179-187
(1984). (SSSHP 28 reprints)
SSSHP 61 Tape: "NTT-4, Text-to-speech conversion, LSP/CV
synthesis", S. Furui, Apr. 12, 1989"
(2 female Japanese sen: "Jishinnoyoona ...
fukanoodearu")
7" reel, good quality
------------------------------------------------------------- Top
PROJECT: TEXT-TO-SPEECH CONVERSION, LSP/CVC SYNTHESIS (1981-1984)
1984 Sato, H., "Speech synthesis using CVC concatenation units
and excitation waveform elements", Trans. Committee on
Speech Research, Acoust. Soc. of Japan, S83-69, 541-546
(1984). (SSSHP 28 reprints)
1986 Y. Sagisaka and H. Sato, "Word identification method for
Japanese text to speech conversion system, ICASSP86, Tokyo,
paper 45.3.1, pp. 2411-2414, 1986
1986 Sagisaka, Y., H. Sato, "Composite phoneme units for the
speech synthesis of Japanese", Speech Communication 5,
217-223 (1986). (SSSHP 28 reprints)
SSSHP 30.3 Tape: "NTT-2, NTT Human Interface Laboratories,
Hirokazu Sato & Sadaoki Furui, Dec 27, 1988"
(4 Japanese sen: "Shinkansenno tabiwa ... yochiwa
genritekini fukanoudearu.")
7" reel, good quality
------------------------------------------------------------- Top
BIOGRAPHIES
SHIN'ICHIRO HASHIMOTO
1958 B.E. in Electrical Engineering, Keio Univ., Tokyo
1958 NTT Musashino Electrical Communication Laboratory
1966-70 speech analysis, synthesis, recognition, transmission in
NTT Research Division
1969 Dr.Eng. in EE, Keio Univ., Tokyo
1975 Chief, Fourth Research Section (speech and image proc, char-
acter recognition)
1986 Dir. of Engineering
Secom Intelligent Systems Lab.
6-11-23 Shimorenjaku Mitaka-shi
Tokyo 181 Japan
FUMITADA ITAKURA
1963 B.E.E., Nagoya Univ., Nagoya
1965 M.E.E., Nagoya Univ., Nagoya, research on speech proc.
1968 NTT Musashino Electrical Communication Laboratory (speech
signal processing)
1972 Dr. of Engineering, Nagoya Univ., Nagoya (thesis: "Speech
analysis and synthesis based on a statistical method")
1973-75 Bell Labs, Murray Hill, NJ (speech recognition)
1976 NTT Staff Engineer (speech analysis/synthesis)
1981 Chief of Fourth Research Section (speech and acoustics
research)
1984 Nagoya University professor (comm. theory/signal proc.)
Dept of Electronics Engineering
Furoucho Chikusa-ku
Nagoya 464 Japan
SHUZO SAITO
1948 B.E.E., Nagoya Univ., Nagoya
1953 NTT Musashino Electrical Communication Laboratory (assessment
of speech quality, speech signal processing)
1962 Dr. of Engineering., Nagoya Univ., Nagoya (thesis:
"Fundamental research on transmission quality of Japanese
phonemes")
1966 Chief of Fourth Research Section (speech and image proc.,
character recognition)
1979 Tokyo Univ. professor (speech science)
1984 Kogakuin Univ. professor (speech processing)
Dept. of Electronics Engineering
1-24-2 Nishi-shinjuku Shinjuku-ku
Tokyo 160 Japan
1986 ICASSP86 Technical Program Chairman
HIROKAZU SATO
1967 B.E.E., Hokkaido Univ., Sapporo
1969 M.E.E., Hokkaido Univ., Sapporo
NTT Musashino Electrical Communication Laboratory (speech
synthesis by rule)
1978 NTT Yokosuka Electrical Communication Laboratory
(development of audio response equipment)
1981 NTT Musashino Electrical Communication Laboratory (Japanese
text-to-speech conversion)
1985 Dr. of Engineering, Hokkaido Univ., Sapporo (thesis:
"Research on speech synthesis by rule")
1987 Senior Research Engineer, Supervisor, NTT Human Interface
Laboratories
NOBORU SUGAMURA
1974 B.E.E., Osaka Univ., Osaka
1976 M.E.E., Osaka Univ., Osaka
NTT Musashino Electrical Communication Laboratory (speech
analysis/synthesis, speech recognition)
1985 Dr. of Engineering, Osaka Univ., Osaka (thesis: "Speech
signal coding using line spectrum parameters")
1986-87 Univ. of Maryland, MD (speech analysis/synthesis)
1989 Senior Research Engineer, NTT Human Interface Laboratories
YOH'ICHI TOHKURA
1970 B.E.E., Tokyo Univ., Tokyo
1972 M.E.E., Tokyo Univ., Tokyo
1972 NTT Musashino Electrical Communication Laboratory (speech
analysis/synthesis, speech perception)
1980 Dr. of Engineering, Tokyo Univ., Tokyo (thesis: "Speech
quality improvement in PARCOR analysis/synthesis system")
1984-85 Bell Telephone Laboratories, Murray Hill, NJ (speech
recognition)
1986 Head, Hearing and Speech Perception Department,
ATR Auditory & Visual Perception Research Laboratories
Twin 21 MID Tower
2-1-61 Shiromi Higashi-ku
Osaka 540 Japan
------------------------------------------------------------- Top
CONTRIBUTIONS AND REVIEW BY:
Dr. Sadaoki Furui
Speech Information Laboratory
NTT Human Interface Laboratories
NTT Corporation
|