NMAH | Smithsonian Speech Synthesis History Project (ss

CORNELL UNIVERSITY


Cornell Phonetics Laboratory
Morrill Hall
Ithaca NY 14853


CONTENTS:

History of the Cornell Phonetics Laboratory

The Speech Research System (SRS) (1974-1983)

Multi-Language Speech Synthesis Research with SRS (1980)

SRS Text-to-Speech Rule Development for English (1983)

SRS Rule Development for Japanese (1983)

The Phone-and-Transition Model of Speech Timing (1990-1992)

Biographies


------------------------------------------------------------- Top
HISTORY OF THE CORNELL PHONETICS LABORATORY


"The seeds for the Cornell Phonetics Laboratory were planted when 
the Division of Modern Languages and Linguistics (now Department 
of Linguistics) in Morrill Hall was first established in 1946.  
J. Milton Cowan was the Director of the Division as well as head 
of the laboratory, and taught a course in acoustic phonetics.  A 
particularly notable piece of equipment used in the course was 
the spectrograph used in the 1940s by Dr. Cowan's Wisconsin 
colleague Martin Joos when he was writing the first monograph on 
the subject, which was published as a supplement to the journal 
LANGUAGE.  In the 1950s a classroom-style language learning 
laboratory employing the latest technology was installed, but 
phonetics research and instruction languished for a time.

When Morrill Hall was renovated in 1971, the laboratories for 
language learning and phonetics were rebuilt and expanded under 
the directorship of Professor Richard L. Leed, and the study of 
phonetics was reinvigorated by Professor Joseph Grimes.  A Digital 
Equipment Corporation PDP-11/40 computer with an OVE IIId speech 
synthesizer was installed in the Phonetics Laboratory, along with 
a new spectrograph and an up-to-date recording studio.

Speech synthesis research in the Cornell Phonetics Laboratory 
began in earnest in 1974, when Susan R. Hertz started developing 
a tool called SRS (Speech Research System) for her Ph.D. thesis.  
With SRS, linguists, even those without programming background, 
could interactively express phonological and phonetic rules that 
produced speech using the OVE IIId speech synthesizer.  SRS was 
subsequently used by Hertz and her collaborators for research into 
rule-based synthesis of a variety of languages, with the most 
extensive work done on Japanese and English.  For more details on 
the speech synthesis activities in the Cornell Phonetics Laboratory, 
see A Personal Narrative by Susan Hertz."

For commercial development of these speech synthesis techniques see
SSSHP USA Eloquent Technology, Inc.


------------------------------------------------------------- Top
PROJECT:  SPEECH RESEARCH SYSTEM (SRS) (1974-1983)


"SRS (Speech Research System) was a linguistically-oriented inter-
active system specifically designed for research in the area of 
speech synthesis by rule for any human language.  SRS's interactive 
facilities for expressing, editing, and testing text-to-speech 
rules could be used by programmers and non-programmers alike. The 
system was used for general research in multi-language speech 
synthesis, and to develop formant-based rule sets for English and 
Japanese, and more rudimentary rules for German, Dutch, and Spanish."


1979 Hertz, S.R., "An Interactive Speech Synthesis System for Lin-
     guistics", Ph.D. dissertation, Cornell University. (1979)


1982 Hertz, S.R., "From text-to-speech with SRS", Journal of the 
     Acoustical Society of America, 72, 1155-1170. (1982)  (K)
     (Copy in SSSHP USA Eloquent Technology, Inc. file.)

     SSSHP 171.1 Tape: "Cornell University speech synthesis samples"
          (syn  "The Cornell Speech Research System") (x2)
          (syn 0:10 "Far above Cayuga's waters, with its waves of 
           blue, stands our noble alma mater, glorious to view.")
          (syn "I like ice cream. I like ice cream. I like ice 
           cream.")
          Cassette, good quality.


------------------------------------------------------------- Top
PROJECT: MULTI-LANGUAGE SPEECH SYNTHESIS RESEARCH WITH SRS (1980)


"In 1980, SRS was used to develop rudimentary synthesis rule sets 
for English, German, Dutch, and Spanish. These rules operated on a 
phonetic symbol string with interspersed prosody annotations and 
produced acoustic parameter values for an OVE IIId speech 
synthesizer. The SRS rules that operated on the input were derived 
in part from an examination of the acoustic structure of these 
particular sentences; however, an attempt was made to express 
generalizations that would apply to other utterances as well."

     SSSHP 171.2 Tape: "Cornell University speech synthesis samples" 
          (syn 0:33 "Hello, I'm a machine.  I've been learning to 
           speak four different languages: Japanese, German, Dutch 
           and Spanish. Listen:  Nihongo-ga ... de español.")  
          Cassette, good quality.


------------------------------------------------------------- Top
PROJECT: SRS TEXT-TO-SPEECH RULE DEVELOPMENT FOR ENGLISH (1983)


"The SRS rules for English generated acoustic parameter values for 
an OVE IIId speech synthesizer on the basis of an input text ex-
pressed in terms of ordinary spelling.  The rules were formulated 
in terms of the four kinds of rules allowed by SRS.  First, a set 
of text-modification rules augmented the original text string, 
typically with morphological and stress-related information needed 
by subsequent rule types.  Second, a set of conversion rules pro-
duced a phonetic symbol string and associated feature matrix on the 
basis of the modified text string.  Third, a set of feature-modifi-
cation rules made context-dependent changes or added redundant 
features to the feature matrix produced by the conversion rules.  
Finally, a set of parameter rules produced a file of synthesizer 
parameter values on the basis of the features in the matrix."


1982 Hertz, S.R., "From text-to-speech with SRS", Journal of the 
     Acoustical Society of America, 72, 1155-1170. (1982)  (K)
     (Copy in SSSHP USA Eloquent Technology, Inc. file.)

     SSSHP 171.3 Tape: "Cornell University speech synthesis samples" 
          (syn 0:31 "Today is August thirty first, nineteen eighty
           two. This tape is being played at ... an alternative way 
           to communicate with machines.")
          Cassette, good quality.


------------------------------------------------------------- Top
PROJECT:  SRS RULE DEVELOPMENT FOR JAPANESE (1983)


"The SRS speech synthesis rules for Japanese operated on a Romanized 
text, with unpredictable accents and phrase boundaries marked, and 
generated a set of acoustic parameter values for an OVE IIId speech 
synthesizer.  One of the main motivations of the Japanese rule 
development was to test the usefulness of a particular hierarchical 
phrase-structure analysis for generating Japanese pitch patterns."


1983 Beckman, M., Hertz, S., and Fujimura, O., "SRS Pitch Rules for 
     Japanese", Working Papers of the Cornell Phonetics Laboratory, 
     1, 1-16. (1983). (Copy in SSSHP USA Eloquent Technology, Inc. 
     file.)


1983 Hertz, S.R. and Beckman, M., "A look at the SRS synthesis rules 
     for Japanese", Proceedings IEEE International Conference of 
     Acoustical Speech Signal Processing, 1336-1339. (1983).
     (Copy in SSSHP USA Eloquent Technology, Inc. file.)

     SSSHP 171.4 Tape: "Cornell University speech synthesis samples"
          (syn 0:13 "kyO'-wa; se'N kyU'-cyaku hatSidZU-sa'NneN; 
           Si'gatsu ... o-cirune-o nasa 'tte-wa ikemase'N.")
          Cassette, good quality.


------------------------------------------------------------- Top
PROJECT:  PHONE-AND-TRANSITION MODEL OF SPEECH TIMING (1990-1992)


"Building on our observations about formant timing patterns during 
our earlier rule-based multi-language speech synthesis research 
between 1978 and 1990, we developed a new model of speech timing, 
called the phone-and-transition model.  The phone-and-transition 
model is based on a segmentation of speech into independent phone 
and formant transition units, rather than abutted phoneme-sized 
units that incorporate the transitions, as in more conventional 
models for rule-based synthesis.  The separate phone and transition 
units are grouped into higher level units, such as phonemes, sylla-
ble nuclei, and syllables.  The model was derived from observations 
of the durational behavior of sonorants before voiced and voiceless 
obstruents, diphthongs in fast and slow speech, and the timing of 
aspiration. In addition to more straightforward expression of timing 
patterns in a particular language, the model has made possible the 
direct expression of a variety of acoustic universals, and it has 
helped us increase our understanding of the relationship between 
phonology and phonetics."


1991 Hertz, S.R., "Streams, phones, and transitions:  toward a 
     phonological and phonetic model of formant timing", Journal 
     of Phonetics, 19, Special Issue on Speech Synthesis and Pho-
     netics, edited by R. Carlson. (1991) (Copy in SSSHP USA 
     Eloquent Technology, Inc. file.)


1992 Hertz, S.R. and Huffman, M.K., "A nucleus-based timing model 
     applied to multi-dialect speech synthesis by rule", Proceed-
     ings of the International Conference on Spoken Language Pro-
     cessing, 2, 1171-1174. (1992) (Copy in SSSHP USA Eloquent 
     Technology, Inc. file.)


------------------------------------------------------------- Top
BIOGRAPHIES


MARY E. BECKMAN

1976 B.A., East Asian linguistics, Univ. of California, Berkeley
1978 M.A., East Asian linguistics, Univ. of California, Berkeley
1984 Ph.D., linguistics, Cornell University, Ithaca, NY
1984 Postdoctoral Researcher, AT&T Bell Labs, Murray Hill, NJ
1985 Professor, Departments of Linguistics and Speech and Hearing 
     Sciences, Ohio State University, Columbus, OH
1994 Visiting Scholar, ATR Interpreting Telephony Research Labora-
     tories, Kyoto, Japan
2000 Senior Research Fellow, Macquarie Centre for Cognitive 
     Sciences, Macquarie Univeristy, Sydney, Australia


OSAMU FUJIMURA  (see SSSHP University of Tokyo file)


SUSAN R. HERTZ  (See SSSHP USA Eloquent Technology, Inc.)


HUFFMAN, M. K.  (See SSSHP USA Eloquent Technology, Inc.)


------------------------------------------------------------- Top
CONTRIBUTIONS AND REVIEW BY:

Dr. Susan R. Hertz
Cornell University and SpeechWorks International, Inc. 

(Quoted material is from a personal communication from Susan R. 
Hertz to H.D. Maxey, January 23, 2002, for this history. See 
SSSHP USA Cornell University file.)
	SSSHP Contents \| Labs \| Abbr. \| Index

Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use