NMAH | Smithsonian Speech Synthesis History Project (ss

HITACHI, LTD.


Speech Research Group
Central Research Laboratory
1-280 Higashi-Koigakubo
Kokubunji, Tokyo 185,  JAPAN


CONTENTS:

HISTORY

PART I: SPEECH SYNTHESIS BY RULE

   VOCAL TRACT ANALOGUE SYNTHESIS (1964-1972)

   SYNTHESIS RULES FOR EXPRESSION OF EMOTIONS (1967-1969)

   MULTIPLEXED SPEECH SYNTHESIZER (1968-1973)

   MULTIPLEXED SPEECH SYNTHESIS METHOD (1973-1976)

   SYNTHESIS BY RULE FOR JAPANESE WORDS (1980-1984)

   SYNTHESIS BY RULE FOR JAPANESE SENTENCES (1984-   )

PART II: EFFICIENT SPEECH CODING AND ITS APPLICATIONS

   LSI SPEECH ANALYZER AND SYNTHESIZER (1978-1980)

   LPC CODEC (1981-1983)

   TOR CODEC (1982-1987)

BIOGRAPHIES


------------------------------------------------------------- Top
HISTORY

Starting in 1910 from an electrical repair shop for a copper
mining company, Hitachi has grown over the decades until it is
one of the largest industrial corporations in the world. Hitachi
is unusually diversified, providing a broad range of products,
such as nuclear power plants, home appliances, computers, heavy
manufacturing equipment, electric cable, metals and chemicals.
Hitachi's principle of achieving technological autonomy encourages
significant investment in research for all of it's products.

Research on speech synthesis was motivated by Hitachi's
application for voice-output for computers and an agreement to
build a vocal tract analog synthesizer for the Electrotechnical
Laboratory (see SSSHP JAPAN Electrotechnical Laboratory file).

The quoted portions in the following project descriptions are
from "Outline of Speech Synthesis Research of Hitachi", S.
Takeda and A. Ichikawa, personal communication to H.D. Maxey,
June 21, 1989. (SSSHP JAPAN Hitachi, Ltd. file)


------------------------------------------------------------- Top
PART I: SPEECH SYNTHESIS BY RULE



PROJECT: VOCAL TRACT ANALOGUE SYNTHESIS (1964 - 1972)


"Speech synthesis by rule using DAVO* type vocal tract analogue
synthesizer (CASSY**) of an ultra high speed analogue computer
technology. It was also built for the Electrotechnical Laboratory
of the Ministry of International Trade and Industry, Japan. Vocal
tract loss and configuration estimation methods were developed.
This synthesis system was exhibited at the 6th International
Conference on Acoustics."

*  DAVO : Dynamic Analog of Vocal Tract (see SSSHP USA MIT file)
** CASSY: Configurational Analogue Speech Synthesizer (nickname
          of the synthesizer)


1967 Ichikawa, A., Y. Nakano, and K. Nakata, "Control rule of
     vocal-tract configuration", J. Acoust. Soc. Amer., 42, 1163
     (A) (1967). (I)


1968 Ichikawa, A., and Nakata, K., "Speech synthesis by rule",
     Proc. 6th Int. Congr. Acoust., Tokyo, Japan, paper B-5-6,
     B171-174, August. (1968)   (I)


     Nakata, K., A. Ichikawa, and T. Miura, "Vocal tract analog
     speech synthesis by rule", Preprints of Speech Symposium
     Kyoto, August 29-30, paper A-5, 1-5. (1968) In SSSHP 75
     Reprints.

     SSSHP 65.1 Tape: "Side A: Speech Synthesis by Rule, Hitachi,
          Ltd, Japan", compiled by A. Ichikawa and S. Takeda
          6/89.
          (A: Male, female, child, short sentences; B: song
          "Ue o muite arukou" or "Sukiyaki", 1:45 min)
          ("heard by the Prince, the present Emperor of Japan")
          Cassette, good quality, copy of DAT copy of master


1971 Ichikawa, A., and K. Nakata, "Vocal tract resonances with
     losses", 7th Int. Congr. Acoust., paper 23C9, 161-164.
     (1971)


------------------------------------------------------------- Top
PROJECT: SYNTHESIS RULES FOR EXPRESSION OF EMOTIONS (1967-1969)


Vocal tract analogue synthesis. "Intonation rules for expression
of sentence types (declarative and interrogative sentences),
emphasis, feelings of joy, anger, sorrow, etc., and intonation
conversion rules from adult female to adult male voice. The
pioneer research in the world on expression of emotions and voice
conversion by intonation."


1967 Nakayama, T., A. Ichikawa, and T. Miura, "Generalization of
     control rule of buzz source to improve the naturalness of
     synthetic speech," 6th Int. Congr. Acoust., paper B-5-5,
     B167-170, August. (1968) In SSSHP 75 Reprints.

     SSSHP 65.2 Tape: "Side A: Speech Synthesis by Rule, Hitachi,
          Ltd, Japan", compiled by A. Ichikawa and S. Takeda
          6/89.
          (9 variations of emphasis, "Aoi ie o uru", to sell a
           blue house)
          Cassette, good quality, copy of DAT copy of master


------------------------------------------------------------- Top
PROJECT: MULTIPLEXED SPEECH SYNTHESIZER FOR SPEECH SYNTHESIS
         BY RULE (1968 - 1973)


"Speech synthesis hardware by compilation of recorded acoustical
elements (damped sinusoidal waveforms) for a multiplexed audio
response system. It was built as a multiplexed audio response
unit for an experimental seat-reservation system of Japan
National Railways, and led to the development of the telephone
reservation system now commonly used in Japan."

Speech synthesis by compilation of recorded acoustical elements
from a magnetic drum to allow concurrent messages to many
telephone lines without providing a separate synthesizer per
telephone line.


1969 Nakata,K., and T. Miura, "A method of speech synthesis for
     multiplexed audio response",  Transactions of the Institute
     of Electronics and Communication Engineers of Japan, C,
     52-C, 10, 579-586, October, in Japanese. (1969) Also,
     Electron. Commun. Japan, 52-C, 126-134, in English. (1969)
     (B)


1971 Kimura, Y., A. Ichikawa, K. Nakata, T. Hyodo, and T. Aso,
     "Development audio response unit," Information Processing,
     12, 7, 397-405, July, in Japanese. (1971)


     Nakata, K., A. Ichikawa, and Y. Nakano, "Audio response
     unit," 7th Int. Congr. Acoust., 24H8, 493-496. (1971)


1972 Kimura, Y., A. Ichikawa, K. Nakata, T. Hyodo, and T. Aso,
     "Development of audio response unit", Information Processing
     in Japan, 12, 1-7, December. (1972) In SSSHP 75 Reprints.

     SSSHP 65.3 Tape: "Side A: Speech Synthesis by Rule, Hitachi,
          Ltd, Japan", compiled by A. Ichikawa and S. Takeda
          6/89.
          (A: sen for seat reservation application;
          (B: words for seat reservation application)
          Cassette, low level -20VU, copy of DAT copy of master


------------------------------------------------------------- Top
PROJECT: MULTIPLEXED SPEECH SYNTHESIS METHOD FOR SPEECH
         SYNTHESIS BY RULE (1973-1976)


"Speech synthesis method by compilation of recorded acoustical
elements (LPC impulse response waveforms) for a multiplexed audio
response system. Production rules for fundamental  frequency and
syllable duration for unrestricted Japanese words were developed.
Accent type estimation rules for corporation and person's names
were also developed. Later, the idea of using syllable units led
to the concept of new speech processing units as demi-syllables
proposed by Dr. Fujimura at BTL (see SSSHP USA Bell Technical
Laboratories file). Furthermore, accent type estimation
technology, which was developed for Japanese words for the first
time, formed the basis for future work in this area."


1974 Nakata, K., and A. Ichikawa, "Speech synthesis for an
     unlimited vocabulary," Speech Communication Seminar
     Stockholm, 2, 261-266, August. (1974) In SSSHP 75 Reprints.


1975 Ichikawa, A., and K. Nakata, "A method of a speech segments
     generation in the speech synthesis of mono-syllables
     edition," Transactions of the Institute of Electronics and
     Communication Engineers of Japan, D, 58-D, 9, 522-529,
     September, in Japanese.  (1975)

     SSSHP 65.4 Tape: "Side A: Speech Synthesis by Rule, Hitachi,
          Ltd, Japan", compiled by A. Ichikawa and S. Takeda
          6/89.
          (A: synthesis by rule sentences; B: Corporation names
           synthesized and inserted in natural speech sentences)
          Cassette, left chan -10VU, copy of DAT copy of master


------------------------------------------------------------- Top
PROJECT: SPEECH SYNTHESIS BY RULE FOR JAPANESE WORDS (1980-1984)


"Speech synthesis using PARCOR synthesizer. Fundamental frequency
contours were produced based on Fujisaki's Model (2nd order
linear system). Female voice was synthesized by computer
synthesizer. Synthesis system was implemented on boards for male
voice."

1984 SSSHP 65.5 Tape: "Side A: Speech Synthesis by Rule, Hitachi,
          Ltd, Japan", compiled by A. Ichikawa and S. Takeda
          6/89.
          (male: names of places, people, and corporations)
          Cassette, -5VU, copy of DAT copy of master


------------------------------------------------------------- Top
PROJECT: SPEECH SYNTHESIS BY RULE FOR JAPANESE SENTENCES (1984-   )


"An LPC synthesizer using prediction residual, called 'the TOR
synthesizer", is developed to enhance speech quality. Fundamental
frequency contours are produced based on Fujisaki's Model. A fine
fluctuation component (phoneme component) of fundamental
frequency is newly introduced in the model and its production
rules are developed. Female voice is synthesized by computer
system, in which a technique for reducing speech quality
degradation inherent in female voice is introduced. Roles of
prosody are also clarified."


1985 Takeda, S., Y. Asakawa, and A. Ichikawa, "A study of speech
     synthesis method utilizing residual information,"
     Transactions of the Committee on Speech Research, Acoust.
     Soc. of Japan, S84-75, 589-596, January, in Japanese. (1985)


1988 Takeda, S., "Speech synthesis system for unrestricted
     Japanese sentences and several methods for improving speech
     quality," Inst. of Elect. Info. and Comm. Engineers
     Technical Report (Speech), SP88-43, 47-54, July, in
     Japanese. (1988)


1989 Takeda, S., "A study of methods for improving quality of
     female speech produced by residual-excited, rule-based
     synthesis system," Inst.  of Elect.  Info.  and Comm.
     Engineers Technical Report (Speech), SP89-3, 17-24, May, in
     Japanese. (1989)

     SSSHP 65.6 Tape: "Side A: Speech Synthesis by Rule, Hitachi,
          Ltd, Japan", compiled by A. Ichikawa and S. Takeda
          6/89.
          (sample scientific sentence)
          Cassette, -10VU, copy of DAT copy of master


------------------------------------------------------------- Top
PART II: EFFICIENT SPEECH CODING AND ITS APPLICATIONS


The addition of automatic analysis techniques to the above LPC
synthesis methods provided for low bit-rate speech transmission
and storage.



PROJECT: LSI SPEECH ANALYZER AND SYNTHESIZER (1978-1980)


"PARCOR analyzer and synthesizer. Possible to select a bit rate
among either 2.4, 4.8, or 9.6 kbps. The synthesizer was composed
of 3 chips, the first speech synthesis chips in Japan."


1980 Sampei, T., A. Asada, and K. Nakata, "High quality PARCOR
     speech synthesizer," IEEE Trans. CE., 26, 8, 353-358,
     August. (1980) In SSSHP 75 Reprints.


     Sato, H., N. Miyahara, K. Nakata, K. Nomiya, T. Sampei, and
     A. Suehiro, "LSI PARCOR speech synthesizer," IECE* Technical
     Report [Semiconductor], SSD79-122, 25-30, March. (1980) In
     Japanese.

     *IECE: Institute of Electronics and Communication Engineers
            of Japan (Old name of IEICE)


1981 Asada, A., Y. Ohta, T. Saito, K. Nakata, and A. Ichikawa,
     "PARCOR speech analysis & recognition apparatus by Le Roux
     method," IECE Technical Report [Electroacoustics], EA80-81,
     23-30, February. (1981) In Japanese.

     SSSHP 65.7 Tape: "Side B: Efficient Speech Coding", Hitachi,
          Ltd, Japan", compiled by A. Ichikawa and S. Takeda
          6/89.
          (English and Japanese samples at various data rates)
          Cassette, -10VU, copy of DAT copy of master


------------------------------------------------------------- Top
PROJECT: LPC CODEC (1981-1983)

"PARCOR analyzer and synthesizer. Possible to select a bit rate
among either 2.4, 4.8, or 9.6 kbps. A board type using Hitachi's
DSP chips (called 'HSP')."


1983 Miyamoto, T., H. Inada, and K. Nakata, "A real time PARCOR
     analysis of speech by high-performance signal processors,"
     Trans. Inst. of Elect. and Comm. Engineers of Japan, A,
     J66-A, 7, 625-632, July. (1983) In Japanese.


     Nakata, K. and T. Miyamoto, "An implementation of real time
     PARCOR analysis by high speed signal processors," 11th Int.
     Congr. Acoust., Paris, 125-128 (1983). In SSSHP 75 Reprints.

     SSSHP 65.8 Tape: "Side B: Efficient Speech Coding", Hitachi,
          Ltd, Japan", compiled by A. Ichikawa and S. Takeda 6/89.
          (Japanese sentences, male and female, two speakers.
           Left channel: Input speech. Right channel: LPC CODEC
           speech at 9.6 kbps)
          Cassette, -10VU, copy of DAT copy of master


------------------------------------------------------------- Top
PROJECT: TOR CODEC (1982-1987)


"A speech coding method called 'the Thinned-Out Residual Method
(TOR)' was developed to enhance speech quality at bit rates
between 8 kbps and 16 kbps. The CODEC was implemented in a single
5-MOPS DSP chip. One of the world's first commercial 8 kbps CODEC
for TDM."


1985 Takeda, S., Y. Asakawa, and A. Ichikawa, "A study of speech
     synthesis method utilizing residual information," Trans.
     Committee on Speech Research, the Acoust. Soc. of Japan,
     S84-75, 589-596, January. (1985) In Japanese.


     Ichikawa, A., S. Takeda, and Y. Asakawa, "A speech coding
     method using thinned-out residual," ICASSP85, 25.7, 961-974,
     March. (1985)


1986 Miyamoto, T., K. Kondo, T. Suzuki, Y. Asakawa, and A.
     Ichikawa, "Single DSP 8 kbps speech codec," ICASSP86, 33.10,
     1717-1720, April. (1986) In SSSHP75 Reprints.

     SSSHP 65.9 Tape: "Side B: Efficient Speech Coding", Hitachi,
          Ltd, Japan", compiled by A. Ichikawa and S. Takeda
          6/89.
          (Japanese sentences, male and female, three data rates;
           English sen, male and female, 8 kbps ICASSP86 demo)
          Cassette, -10VU, copy of DAT copy of master


------------------------------------------------------------- Top
BIOGRAPHIES


YOSHIAKI ASAKAWA

1977 B.S. in physical engineering, Tokyo Inst. of Tech., Tokyo
1979 M.S. in physical engineering, Tokyo Inst. of Tech., Tokyo
     CRL, Hitachi, Ltd., speech recognition, speech coding
1988 Member, Speech Section, Committee of Standardization of
     Japanese Language, Japan Electronic Industry Development
     Association


AKIRA ICHIKAWA

1964 B.S. in electrical engineering, Keio University, Tokyo
     CRL, Hitachi, Ltd., Senior Researcher of CRL, speech
     synthesis, speech coding, speaker identification,
     Chinese character recognition, development of push
     button signal receiver, speech recognition, and
     conversational speech understanding
1981 Ph.D., Keio University, Tokyo
1982 Head of Speech Research Group
1984 -86 Member Transaction Editoral Board, Institute of
     Electronics, Information and Communication Engineers,
     Japan
1986-88 Director, Institute of Electronics, Information and
     Communication Engineers, Japan


YOSHINORI KITAHARA

1979 B.A. in information and behavioral sciences, Hiroshima
     Univ., Hiroshima
1981 M.S. in environmental studies, Hiroshima Univ., Hiroshima
     CRL, Hitachi, Ltd., speech perception, speech interface
     evaluation
1986-89 Visiting researcher, ATR Auditory and Visual Perception
     Research Laboratories, Kyoto
1989 CRL, Hitachi, Ltd.


KAZUHIRO KONDO

1982 B.S. in electronics and communications engineering, Waseda
     Univ., Tokyo
1984 M.S. in electronics and communications engineering, Waseda
     Univ., Tokyo
     CRL, Hitachi, Ltd., speech coding, speech packet


TANETOSHI MIURA

1943 B.S. in electrical engineering, Tohoku Univ., Sendai
1943-62 Electrical Communication Laboratory, N.T.T., development
     of telephone transmission standard
1957 Ph.D., Tohoku Univ., Sendai
1962-77 Central Research Laboratory, Hitachi, Ltd.
     Chief Researcher of CRL, speech quality, electroacoustic
     conversion, underwater acoustics, indoor acoustics,
     stereophonic systems
1963-65 Head of Speech Research Group, CRL
1977 Prof., Dept. of Electronic Engineering
     Tokyo Denki University
     2-2 Nishiki-cho, Kanda
     Chiyoda-ku, Tokyo 101, Japan
1986 President of Acoust. Soc. of Japan


TAKANORI MIYAMOTO

1978 B.S. in communications engineering, Osaka Univ., Osaka
1980 M.S. in communications engineering, Osaka Univ., Osaka
     CRL, Hitachi, Ltd., speech coding, MODEM, baseband digital
     transmission


AKIRA NAKAJIMA

1969 B.S. in electronic engineering, Tokyo Inst. of Tech., Tokyo
1971 M.S. in electronic engineering, Tokyo Inst. of Tech., Tokyo
1971-83 CRL, Hitachi, Ltd.
     Speech synthesis by rule, speaker identification, and
     word processors
1983 Microelectronics Product Development Laboratory,
     Hitachi, Ltd.


YASUAKI NAKANO

1961 BS in applied physics, Univ. of Tokyo, Tokyo
1963 MS in mathematical engineering, "  "     "
     Hitachi CRL: PCM transmission quality, sonar, analysis/
     synthesis of speech, recognition of Chinese characters


KAZUO NAKATA

1950 B.S. in electrical engineering, Nagoya University, Nagoya
1950-53 Radio Regulatory Bureau of Japan
1953-65 Radio Research Laboratories, speech synthesis, recog.,
     and transmission
1957-58 Invited researcher at Res. Lab. Elect., Mass. Inst. of
     Tech., Cambridge, Mass (K.N. Stevens)
1962 Ph.D. in Elect. Engr., Tohoku Univ., Sendai
1965-82 Central Research Lab., Hitachi, Chief Researcher of
     CRL, speech synthesis, coding, and recog., Chinese char
     recog.
1966-82 Head of Speech Research Group, CRL
1982 Prof., Department of Applied Physics
     Tokyo Univ. of Agriculture & Technology
     2-24-16 Naka-machi
     Koganei-shi, Tokyo, 184 JAPAN


TAKESHI NAKAYAMA

1958 B.A. in literature, Waseda Univ., Tokyo
1963 finished doctor's course in literature, Waseda Univ.
1963-87 CRL, Hitachi, Ltd., Senior Researcher of CRL,
     Speech quality evaluation, voice quality conversion,
     image quality evaluation, methods of designing human-
     interface evaluation
     Director, Acoustic Society of Japan
1970 Ph.D., Waseda University, Tokyo
1987 Prof., Dept. of Information Engineering
     Toyama University
     3190 Gofuku, Toyama-shi 930, Japan


TOSHIRO SUZUKI

1970 B.S. in electrical engineering, Tokyo Metropolitan
     University, Tokyo
1972 M.S. in electrical engineering, Tokyo Metropolitan
     University, Tokyo
     CRL, Hitachi, Ltd., Senior Researcher of CRL
1989 Head of Speech Transmission Group, PCM CODEC, digital
     subscriber transmission, speech coding


SHOICHI TAKEDA

1970 B.S. in mechanical engineering, Tokyo Inst. of Tech., Tokyo
1970-71 Toshiba Electric Co., Ltd.
1974 M.S. in mechanical engineering for production, Univ. of
     Tokyo, Tokyo
1974 CRL, Hitachi, Ltd.
1974-75 Artificial Intelligence Lab., Stanford Univ., California
     (Supported by Rotary Foundation)
     Tactile sensors for robots
1975-80 Machine vision, object recognition, image processing
1980 Speech synthesis by rule, speech analysis-synthesis

------------------------------------------------------------- Top
CONTRIBUTIONS AND REVIEW BY:

Dr. Akira Ichikawa  (Head of Speech Research Group)
Mr. Shoichi Takeda
	SSSHP Contents \| Labs \| Abbr. \| Index

Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use