HITACHI, LTD.
Speech Research Group
Central Research Laboratory
1-280 Higashi-Koigakubo
Kokubunji, Tokyo 185, JAPAN
CONTENTS:
HISTORY
PART I: SPEECH SYNTHESIS BY RULE
VOCAL TRACT ANALOGUE SYNTHESIS (1964-1972)
SYNTHESIS RULES FOR EXPRESSION OF EMOTIONS (1967-1969)
MULTIPLEXED SPEECH SYNTHESIZER (1968-1973)
MULTIPLEXED SPEECH SYNTHESIS METHOD (1973-1976)
SYNTHESIS BY RULE FOR JAPANESE WORDS (1980-1984)
SYNTHESIS BY RULE FOR JAPANESE SENTENCES (1984- )
PART II: EFFICIENT SPEECH CODING AND ITS APPLICATIONS
LSI SPEECH ANALYZER AND SYNTHESIZER (1978-1980)
LPC CODEC (1981-1983)
TOR CODEC (1982-1987)
BIOGRAPHIES
------------------------------------------------------------- Top
HISTORY
Starting in 1910 from an electrical repair shop for a copper
mining company, Hitachi has grown over the decades until it is
one of the largest industrial corporations in the world. Hitachi
is unusually diversified, providing a broad range of products,
such as nuclear power plants, home appliances, computers, heavy
manufacturing equipment, electric cable, metals and chemicals.
Hitachi's principle of achieving technological autonomy encourages
significant investment in research for all of it's products.
Research on speech synthesis was motivated by Hitachi's
application for voice-output for computers and an agreement to
build a vocal tract analog synthesizer for the Electrotechnical
Laboratory (see SSSHP JAPAN Electrotechnical Laboratory file).
The quoted portions in the following project descriptions are
from "Outline of Speech Synthesis Research of Hitachi", S.
Takeda and A. Ichikawa, personal communication to H.D. Maxey,
June 21, 1989. (SSSHP JAPAN Hitachi, Ltd. file)
------------------------------------------------------------- Top
PART I: SPEECH SYNTHESIS BY RULE
PROJECT: VOCAL TRACT ANALOGUE SYNTHESIS (1964 - 1972)
"Speech synthesis by rule using DAVO* type vocal tract analogue
synthesizer (CASSY**) of an ultra high speed analogue computer
technology. It was also built for the Electrotechnical Laboratory
of the Ministry of International Trade and Industry, Japan. Vocal
tract loss and configuration estimation methods were developed.
This synthesis system was exhibited at the 6th International
Conference on Acoustics."
* DAVO : Dynamic Analog of Vocal Tract (see SSSHP USA MIT file)
** CASSY: Configurational Analogue Speech Synthesizer (nickname
of the synthesizer)
1967 Ichikawa, A., Y. Nakano, and K. Nakata, "Control rule of
vocal-tract configuration", J. Acoust. Soc. Amer., 42, 1163
(A) (1967). (I)
1968 Ichikawa, A., and Nakata, K., "Speech synthesis by rule",
Proc. 6th Int. Congr. Acoust., Tokyo, Japan, paper B-5-6,
B171-174, August. (1968) (I)
Nakata, K., A. Ichikawa, and T. Miura, "Vocal tract analog
speech synthesis by rule", Preprints of Speech Symposium
Kyoto, August 29-30, paper A-5, 1-5. (1968) In SSSHP 75
Reprints.
SSSHP 65.1 Tape: "Side A: Speech Synthesis by Rule, Hitachi,
Ltd, Japan", compiled by A. Ichikawa and S. Takeda
6/89.
(A: Male, female, child, short sentences; B: song
"Ue o muite arukou" or "Sukiyaki", 1:45 min)
("heard by the Prince, the present Emperor of Japan")
Cassette, good quality, copy of DAT copy of master
1971 Ichikawa, A., and K. Nakata, "Vocal tract resonances with
losses", 7th Int. Congr. Acoust., paper 23C9, 161-164.
(1971)
------------------------------------------------------------- Top
PROJECT: SYNTHESIS RULES FOR EXPRESSION OF EMOTIONS (1967-1969)
Vocal tract analogue synthesis. "Intonation rules for expression
of sentence types (declarative and interrogative sentences),
emphasis, feelings of joy, anger, sorrow, etc., and intonation
conversion rules from adult female to adult male voice. The
pioneer research in the world on expression of emotions and voice
conversion by intonation."
1967 Nakayama, T., A. Ichikawa, and T. Miura, "Generalization of
control rule of buzz source to improve the naturalness of
synthetic speech," 6th Int. Congr. Acoust., paper B-5-5,
B167-170, August. (1968) In SSSHP 75 Reprints.
SSSHP 65.2 Tape: "Side A: Speech Synthesis by Rule, Hitachi,
Ltd, Japan", compiled by A. Ichikawa and S. Takeda
6/89.
(9 variations of emphasis, "Aoi ie o uru", to sell a
blue house)
Cassette, good quality, copy of DAT copy of master
------------------------------------------------------------- Top
PROJECT: MULTIPLEXED SPEECH SYNTHESIZER FOR SPEECH SYNTHESIS
BY RULE (1968 - 1973)
"Speech synthesis hardware by compilation of recorded acoustical
elements (damped sinusoidal waveforms) for a multiplexed audio
response system. It was built as a multiplexed audio response
unit for an experimental seat-reservation system of Japan
National Railways, and led to the development of the telephone
reservation system now commonly used in Japan."
Speech synthesis by compilation of recorded acoustical elements
from a magnetic drum to allow concurrent messages to many
telephone lines without providing a separate synthesizer per
telephone line.
1969 Nakata,K., and T. Miura, "A method of speech synthesis for
multiplexed audio response", Transactions of the Institute
of Electronics and Communication Engineers of Japan, C,
52-C, 10, 579-586, October, in Japanese. (1969) Also,
Electron. Commun. Japan, 52-C, 126-134, in English. (1969)
(B)
1971 Kimura, Y., A. Ichikawa, K. Nakata, T. Hyodo, and T. Aso,
"Development audio response unit," Information Processing,
12, 7, 397-405, July, in Japanese. (1971)
Nakata, K., A. Ichikawa, and Y. Nakano, "Audio response
unit," 7th Int. Congr. Acoust., 24H8, 493-496. (1971)
1972 Kimura, Y., A. Ichikawa, K. Nakata, T. Hyodo, and T. Aso,
"Development of audio response unit", Information Processing
in Japan, 12, 1-7, December. (1972) In SSSHP 75 Reprints.
SSSHP 65.3 Tape: "Side A: Speech Synthesis by Rule, Hitachi,
Ltd, Japan", compiled by A. Ichikawa and S. Takeda
6/89.
(A: sen for seat reservation application;
(B: words for seat reservation application)
Cassette, low level -20VU, copy of DAT copy of master
------------------------------------------------------------- Top
PROJECT: MULTIPLEXED SPEECH SYNTHESIS METHOD FOR SPEECH
SYNTHESIS BY RULE (1973-1976)
"Speech synthesis method by compilation of recorded acoustical
elements (LPC impulse response waveforms) for a multiplexed audio
response system. Production rules for fundamental frequency and
syllable duration for unrestricted Japanese words were developed.
Accent type estimation rules for corporation and person's names
were also developed. Later, the idea of using syllable units led
to the concept of new speech processing units as demi-syllables
proposed by Dr. Fujimura at BTL (see SSSHP USA Bell Technical
Laboratories file). Furthermore, accent type estimation
technology, which was developed for Japanese words for the first
time, formed the basis for future work in this area."
1974 Nakata, K., and A. Ichikawa, "Speech synthesis for an
unlimited vocabulary," Speech Communication Seminar
Stockholm, 2, 261-266, August. (1974) In SSSHP 75 Reprints.
1975 Ichikawa, A., and K. Nakata, "A method of a speech segments
generation in the speech synthesis of mono-syllables
edition," Transactions of the Institute of Electronics and
Communication Engineers of Japan, D, 58-D, 9, 522-529,
September, in Japanese. (1975)
SSSHP 65.4 Tape: "Side A: Speech Synthesis by Rule, Hitachi,
Ltd, Japan", compiled by A. Ichikawa and S. Takeda
6/89.
(A: synthesis by rule sentences; B: Corporation names
synthesized and inserted in natural speech sentences)
Cassette, left chan -10VU, copy of DAT copy of master
------------------------------------------------------------- Top
PROJECT: SPEECH SYNTHESIS BY RULE FOR JAPANESE WORDS (1980-1984)
"Speech synthesis using PARCOR synthesizer. Fundamental frequency
contours were produced based on Fujisaki's Model (2nd order
linear system). Female voice was synthesized by computer
synthesizer. Synthesis system was implemented on boards for male
voice."
1984 SSSHP 65.5 Tape: "Side A: Speech Synthesis by Rule, Hitachi,
Ltd, Japan", compiled by A. Ichikawa and S. Takeda
6/89.
(male: names of places, people, and corporations)
Cassette, -5VU, copy of DAT copy of master
------------------------------------------------------------- Top
PROJECT: SPEECH SYNTHESIS BY RULE FOR JAPANESE SENTENCES (1984- )
"An LPC synthesizer using prediction residual, called 'the TOR
synthesizer", is developed to enhance speech quality. Fundamental
frequency contours are produced based on Fujisaki's Model. A fine
fluctuation component (phoneme component) of fundamental
frequency is newly introduced in the model and its production
rules are developed. Female voice is synthesized by computer
system, in which a technique for reducing speech quality
degradation inherent in female voice is introduced. Roles of
prosody are also clarified."
1985 Takeda, S., Y. Asakawa, and A. Ichikawa, "A study of speech
synthesis method utilizing residual information,"
Transactions of the Committee on Speech Research, Acoust.
Soc. of Japan, S84-75, 589-596, January, in Japanese. (1985)
1988 Takeda, S., "Speech synthesis system for unrestricted
Japanese sentences and several methods for improving speech
quality," Inst. of Elect. Info. and Comm. Engineers
Technical Report (Speech), SP88-43, 47-54, July, in
Japanese. (1988)
1989 Takeda, S., "A study of methods for improving quality of
female speech produced by residual-excited, rule-based
synthesis system," Inst. of Elect. Info. and Comm.
Engineers Technical Report (Speech), SP89-3, 17-24, May, in
Japanese. (1989)
SSSHP 65.6 Tape: "Side A: Speech Synthesis by Rule, Hitachi,
Ltd, Japan", compiled by A. Ichikawa and S. Takeda
6/89.
(sample scientific sentence)
Cassette, -10VU, copy of DAT copy of master
------------------------------------------------------------- Top
PART II: EFFICIENT SPEECH CODING AND ITS APPLICATIONS
The addition of automatic analysis techniques to the above LPC
synthesis methods provided for low bit-rate speech transmission
and storage.
PROJECT: LSI SPEECH ANALYZER AND SYNTHESIZER (1978-1980)
"PARCOR analyzer and synthesizer. Possible to select a bit rate
among either 2.4, 4.8, or 9.6 kbps. The synthesizer was composed
of 3 chips, the first speech synthesis chips in Japan."
1980 Sampei, T., A. Asada, and K. Nakata, "High quality PARCOR
speech synthesizer," IEEE Trans. CE., 26, 8, 353-358,
August. (1980) In SSSHP 75 Reprints.
Sato, H., N. Miyahara, K. Nakata, K. Nomiya, T. Sampei, and
A. Suehiro, "LSI PARCOR speech synthesizer," IECE* Technical
Report [Semiconductor], SSD79-122, 25-30, March. (1980) In
Japanese.
*IECE: Institute of Electronics and Communication Engineers
of Japan (Old name of IEICE)
1981 Asada, A., Y. Ohta, T. Saito, K. Nakata, and A. Ichikawa,
"PARCOR speech analysis & recognition apparatus by Le Roux
method," IECE Technical Report [Electroacoustics], EA80-81,
23-30, February. (1981) In Japanese.
SSSHP 65.7 Tape: "Side B: Efficient Speech Coding", Hitachi,
Ltd, Japan", compiled by A. Ichikawa and S. Takeda
6/89.
(English and Japanese samples at various data rates)
Cassette, -10VU, copy of DAT copy of master
------------------------------------------------------------- Top
PROJECT: LPC CODEC (1981-1983)
"PARCOR analyzer and synthesizer. Possible to select a bit rate
among either 2.4, 4.8, or 9.6 kbps. A board type using Hitachi's
DSP chips (called 'HSP')."
1983 Miyamoto, T., H. Inada, and K. Nakata, "A real time PARCOR
analysis of speech by high-performance signal processors,"
Trans. Inst. of Elect. and Comm. Engineers of Japan, A,
J66-A, 7, 625-632, July. (1983) In Japanese.
Nakata, K. and T. Miyamoto, "An implementation of real time
PARCOR analysis by high speed signal processors," 11th Int.
Congr. Acoust., Paris, 125-128 (1983). In SSSHP 75 Reprints.
SSSHP 65.8 Tape: "Side B: Efficient Speech Coding", Hitachi,
Ltd, Japan", compiled by A. Ichikawa and S. Takeda 6/89.
(Japanese sentences, male and female, two speakers.
Left channel: Input speech. Right channel: LPC CODEC
speech at 9.6 kbps)
Cassette, -10VU, copy of DAT copy of master
------------------------------------------------------------- Top
PROJECT: TOR CODEC (1982-1987)
"A speech coding method called 'the Thinned-Out Residual Method
(TOR)' was developed to enhance speech quality at bit rates
between 8 kbps and 16 kbps. The CODEC was implemented in a single
5-MOPS DSP chip. One of the world's first commercial 8 kbps CODEC
for TDM."
1985 Takeda, S., Y. Asakawa, and A. Ichikawa, "A study of speech
synthesis method utilizing residual information," Trans.
Committee on Speech Research, the Acoust. Soc. of Japan,
S84-75, 589-596, January. (1985) In Japanese.
Ichikawa, A., S. Takeda, and Y. Asakawa, "A speech coding
method using thinned-out residual," ICASSP85, 25.7, 961-974,
March. (1985)
1986 Miyamoto, T., K. Kondo, T. Suzuki, Y. Asakawa, and A.
Ichikawa, "Single DSP 8 kbps speech codec," ICASSP86, 33.10,
1717-1720, April. (1986) In SSSHP75 Reprints.
SSSHP 65.9 Tape: "Side B: Efficient Speech Coding", Hitachi,
Ltd, Japan", compiled by A. Ichikawa and S. Takeda
6/89.
(Japanese sentences, male and female, three data rates;
English sen, male and female, 8 kbps ICASSP86 demo)
Cassette, -10VU, copy of DAT copy of master
------------------------------------------------------------- Top
BIOGRAPHIES
YOSHIAKI ASAKAWA
1977 B.S. in physical engineering, Tokyo Inst. of Tech., Tokyo
1979 M.S. in physical engineering, Tokyo Inst. of Tech., Tokyo
CRL, Hitachi, Ltd., speech recognition, speech coding
1988 Member, Speech Section, Committee of Standardization of
Japanese Language, Japan Electronic Industry Development
Association
AKIRA ICHIKAWA
1964 B.S. in electrical engineering, Keio University, Tokyo
CRL, Hitachi, Ltd., Senior Researcher of CRL, speech
synthesis, speech coding, speaker identification,
Chinese character recognition, development of push
button signal receiver, speech recognition, and
conversational speech understanding
1981 Ph.D., Keio University, Tokyo
1982 Head of Speech Research Group
1984 -86 Member Transaction Editoral Board, Institute of
Electronics, Information and Communication Engineers,
Japan
1986-88 Director, Institute of Electronics, Information and
Communication Engineers, Japan
YOSHINORI KITAHARA
1979 B.A. in information and behavioral sciences, Hiroshima
Univ., Hiroshima
1981 M.S. in environmental studies, Hiroshima Univ., Hiroshima
CRL, Hitachi, Ltd., speech perception, speech interface
evaluation
1986-89 Visiting researcher, ATR Auditory and Visual Perception
Research Laboratories, Kyoto
1989 CRL, Hitachi, Ltd.
KAZUHIRO KONDO
1982 B.S. in electronics and communications engineering, Waseda
Univ., Tokyo
1984 M.S. in electronics and communications engineering, Waseda
Univ., Tokyo
CRL, Hitachi, Ltd., speech coding, speech packet
TANETOSHI MIURA
1943 B.S. in electrical engineering, Tohoku Univ., Sendai
1943-62 Electrical Communication Laboratory, N.T.T., development
of telephone transmission standard
1957 Ph.D., Tohoku Univ., Sendai
1962-77 Central Research Laboratory, Hitachi, Ltd.
Chief Researcher of CRL, speech quality, electroacoustic
conversion, underwater acoustics, indoor acoustics,
stereophonic systems
1963-65 Head of Speech Research Group, CRL
1977 Prof., Dept. of Electronic Engineering
Tokyo Denki University
2-2 Nishiki-cho, Kanda
Chiyoda-ku, Tokyo 101, Japan
1986 President of Acoust. Soc. of Japan
TAKANORI MIYAMOTO
1978 B.S. in communications engineering, Osaka Univ., Osaka
1980 M.S. in communications engineering, Osaka Univ., Osaka
CRL, Hitachi, Ltd., speech coding, MODEM, baseband digital
transmission
AKIRA NAKAJIMA
1969 B.S. in electronic engineering, Tokyo Inst. of Tech., Tokyo
1971 M.S. in electronic engineering, Tokyo Inst. of Tech., Tokyo
1971-83 CRL, Hitachi, Ltd.
Speech synthesis by rule, speaker identification, and
word processors
1983 Microelectronics Product Development Laboratory,
Hitachi, Ltd.
YASUAKI NAKANO
1961 BS in applied physics, Univ. of Tokyo, Tokyo
1963 MS in mathematical engineering, " " "
Hitachi CRL: PCM transmission quality, sonar, analysis/
synthesis of speech, recognition of Chinese characters
KAZUO NAKATA
1950 B.S. in electrical engineering, Nagoya University, Nagoya
1950-53 Radio Regulatory Bureau of Japan
1953-65 Radio Research Laboratories, speech synthesis, recog.,
and transmission
1957-58 Invited researcher at Res. Lab. Elect., Mass. Inst. of
Tech., Cambridge, Mass (K.N. Stevens)
1962 Ph.D. in Elect. Engr., Tohoku Univ., Sendai
1965-82 Central Research Lab., Hitachi, Chief Researcher of
CRL, speech synthesis, coding, and recog., Chinese char
recog.
1966-82 Head of Speech Research Group, CRL
1982 Prof., Department of Applied Physics
Tokyo Univ. of Agriculture & Technology
2-24-16 Naka-machi
Koganei-shi, Tokyo, 184 JAPAN
TAKESHI NAKAYAMA
1958 B.A. in literature, Waseda Univ., Tokyo
1963 finished doctor's course in literature, Waseda Univ.
1963-87 CRL, Hitachi, Ltd., Senior Researcher of CRL,
Speech quality evaluation, voice quality conversion,
image quality evaluation, methods of designing human-
interface evaluation
Director, Acoustic Society of Japan
1970 Ph.D., Waseda University, Tokyo
1987 Prof., Dept. of Information Engineering
Toyama University
3190 Gofuku, Toyama-shi 930, Japan
TOSHIRO SUZUKI
1970 B.S. in electrical engineering, Tokyo Metropolitan
University, Tokyo
1972 M.S. in electrical engineering, Tokyo Metropolitan
University, Tokyo
CRL, Hitachi, Ltd., Senior Researcher of CRL
1989 Head of Speech Transmission Group, PCM CODEC, digital
subscriber transmission, speech coding
SHOICHI TAKEDA
1970 B.S. in mechanical engineering, Tokyo Inst. of Tech., Tokyo
1970-71 Toshiba Electric Co., Ltd.
1974 M.S. in mechanical engineering for production, Univ. of
Tokyo, Tokyo
1974 CRL, Hitachi, Ltd.
1974-75 Artificial Intelligence Lab., Stanford Univ., California
(Supported by Rotary Foundation)
Tactile sensors for robots
1975-80 Machine vision, object recognition, image processing
1980 Speech synthesis by rule, speech analysis-synthesis
------------------------------------------------------------- Top
CONTRIBUTIONS AND REVIEW BY:
Dr. Akira Ichikawa (Head of Speech Research Group)
Mr. Shoichi Takeda
|