CORNELL UNIVERSITY
Cornell Phonetics Laboratory
Morrill Hall
Ithaca NY 14853
CONTENTS:
History of the Cornell Phonetics Laboratory
The Speech Research System (SRS) (1974-1983)
Multi-Language Speech Synthesis Research with SRS (1980)
SRS Text-to-Speech Rule Development for English (1983)
SRS Rule Development for Japanese (1983)
The Phone-and-Transition Model of Speech Timing (1990-1992)
Biographies
------------------------------------------------------------- Top
HISTORY OF THE CORNELL PHONETICS LABORATORY
"The seeds for the Cornell Phonetics Laboratory were planted when
the Division of Modern Languages and Linguistics (now Department
of Linguistics) in Morrill Hall was first established in 1946.
J. Milton Cowan was the Director of the Division as well as head
of the laboratory, and taught a course in acoustic phonetics. A
particularly notable piece of equipment used in the course was
the spectrograph used in the 1940s by Dr. Cowan's Wisconsin
colleague Martin Joos when he was writing the first monograph on
the subject, which was published as a supplement to the journal
LANGUAGE. In the 1950s a classroom-style language learning
laboratory employing the latest technology was installed, but
phonetics research and instruction languished for a time.
When Morrill Hall was renovated in 1971, the laboratories for
language learning and phonetics were rebuilt and expanded under
the directorship of Professor Richard L. Leed, and the study of
phonetics was reinvigorated by Professor Joseph Grimes. A Digital
Equipment Corporation PDP-11/40 computer with an OVE IIId speech
synthesizer was installed in the Phonetics Laboratory, along with
a new spectrograph and an up-to-date recording studio.
Speech synthesis research in the Cornell Phonetics Laboratory
began in earnest in 1974, when Susan R. Hertz started developing
a tool called SRS (Speech Research System) for her Ph.D. thesis.
With SRS, linguists, even those without programming background,
could interactively express phonological and phonetic rules that
produced speech using the OVE IIId speech synthesizer. SRS was
subsequently used by Hertz and her collaborators for research into
rule-based synthesis of a variety of languages, with the most
extensive work done on Japanese and English. For more details on
the speech synthesis activities in the Cornell Phonetics Laboratory,
see A Personal Narrative by Susan Hertz."
For commercial development of these speech synthesis techniques see
SSSHP USA Eloquent Technology, Inc.
------------------------------------------------------------- Top
PROJECT: SPEECH RESEARCH SYSTEM (SRS) (1974-1983)
"SRS (Speech Research System) was a linguistically-oriented inter-
active system specifically designed for research in the area of
speech synthesis by rule for any human language. SRS's interactive
facilities for expressing, editing, and testing text-to-speech
rules could be used by programmers and non-programmers alike. The
system was used for general research in multi-language speech
synthesis, and to develop formant-based rule sets for English and
Japanese, and more rudimentary rules for German, Dutch, and Spanish."
1979 Hertz, S.R., "An Interactive Speech Synthesis System for Lin-
guistics", Ph.D. dissertation, Cornell University. (1979)
1982 Hertz, S.R., "From text-to-speech with SRS", Journal of the
Acoustical Society of America, 72, 1155-1170. (1982) (K)
(Copy in SSSHP USA Eloquent Technology, Inc. file.)
SSSHP 171.1 Tape: "Cornell University speech synthesis samples"
(syn "The Cornell Speech Research System") (x2)
(syn 0:10 "Far above Cayuga's waters, with its waves of
blue, stands our noble alma mater, glorious to view.")
(syn "I like ice cream. I like ice cream. I like ice
cream.")
Cassette, good quality.
------------------------------------------------------------- Top
PROJECT: MULTI-LANGUAGE SPEECH SYNTHESIS RESEARCH WITH SRS (1980)
"In 1980, SRS was used to develop rudimentary synthesis rule sets
for English, German, Dutch, and Spanish. These rules operated on a
phonetic symbol string with interspersed prosody annotations and
produced acoustic parameter values for an OVE IIId speech
synthesizer. The SRS rules that operated on the input were derived
in part from an examination of the acoustic structure of these
particular sentences; however, an attempt was made to express
generalizations that would apply to other utterances as well."
SSSHP 171.2 Tape: "Cornell University speech synthesis samples"
(syn 0:33 "Hello, I'm a machine. I've been learning to
speak four different languages: Japanese, German, Dutch
and Spanish. Listen: Nihongo-ga ... de español.")
Cassette, good quality.
------------------------------------------------------------- Top
PROJECT: SRS TEXT-TO-SPEECH RULE DEVELOPMENT FOR ENGLISH (1983)
"The SRS rules for English generated acoustic parameter values for
an OVE IIId speech synthesizer on the basis of an input text ex-
pressed in terms of ordinary spelling. The rules were formulated
in terms of the four kinds of rules allowed by SRS. First, a set
of text-modification rules augmented the original text string,
typically with morphological and stress-related information needed
by subsequent rule types. Second, a set of conversion rules pro-
duced a phonetic symbol string and associated feature matrix on the
basis of the modified text string. Third, a set of feature-modifi-
cation rules made context-dependent changes or added redundant
features to the feature matrix produced by the conversion rules.
Finally, a set of parameter rules produced a file of synthesizer
parameter values on the basis of the features in the matrix."
1982 Hertz, S.R., "From text-to-speech with SRS", Journal of the
Acoustical Society of America, 72, 1155-1170. (1982) (K)
(Copy in SSSHP USA Eloquent Technology, Inc. file.)
SSSHP 171.3 Tape: "Cornell University speech synthesis samples"
(syn 0:31 "Today is August thirty first, nineteen eighty
two. This tape is being played at ... an alternative way
to communicate with machines.")
Cassette, good quality.
------------------------------------------------------------- Top
PROJECT: SRS RULE DEVELOPMENT FOR JAPANESE (1983)
"The SRS speech synthesis rules for Japanese operated on a Romanized
text, with unpredictable accents and phrase boundaries marked, and
generated a set of acoustic parameter values for an OVE IIId speech
synthesizer. One of the main motivations of the Japanese rule
development was to test the usefulness of a particular hierarchical
phrase-structure analysis for generating Japanese pitch patterns."
1983 Beckman, M., Hertz, S., and Fujimura, O., "SRS Pitch Rules for
Japanese", Working Papers of the Cornell Phonetics Laboratory,
1, 1-16. (1983). (Copy in SSSHP USA Eloquent Technology, Inc.
file.)
1983 Hertz, S.R. and Beckman, M., "A look at the SRS synthesis rules
for Japanese", Proceedings IEEE International Conference of
Acoustical Speech Signal Processing, 1336-1339. (1983).
(Copy in SSSHP USA Eloquent Technology, Inc. file.)
SSSHP 171.4 Tape: "Cornell University speech synthesis samples"
(syn 0:13 "kyO'-wa; se'N kyU'-cyaku hatSidZU-sa'NneN;
Si'gatsu ... o-cirune-o nasa 'tte-wa ikemase'N.")
Cassette, good quality.
------------------------------------------------------------- Top
PROJECT: PHONE-AND-TRANSITION MODEL OF SPEECH TIMING (1990-1992)
"Building on our observations about formant timing patterns during
our earlier rule-based multi-language speech synthesis research
between 1978 and 1990, we developed a new model of speech timing,
called the phone-and-transition model. The phone-and-transition
model is based on a segmentation of speech into independent phone
and formant transition units, rather than abutted phoneme-sized
units that incorporate the transitions, as in more conventional
models for rule-based synthesis. The separate phone and transition
units are grouped into higher level units, such as phonemes, sylla-
ble nuclei, and syllables. The model was derived from observations
of the durational behavior of sonorants before voiced and voiceless
obstruents, diphthongs in fast and slow speech, and the timing of
aspiration. In addition to more straightforward expression of timing
patterns in a particular language, the model has made possible the
direct expression of a variety of acoustic universals, and it has
helped us increase our understanding of the relationship between
phonology and phonetics."
1991 Hertz, S.R., "Streams, phones, and transitions: toward a
phonological and phonetic model of formant timing", Journal
of Phonetics, 19, Special Issue on Speech Synthesis and Pho-
netics, edited by R. Carlson. (1991) (Copy in SSSHP USA
Eloquent Technology, Inc. file.)
1992 Hertz, S.R. and Huffman, M.K., "A nucleus-based timing model
applied to multi-dialect speech synthesis by rule", Proceed-
ings of the International Conference on Spoken Language Pro-
cessing, 2, 1171-1174. (1992) (Copy in SSSHP USA Eloquent
Technology, Inc. file.)
------------------------------------------------------------- Top
BIOGRAPHIES
MARY E. BECKMAN
1976 B.A., East Asian linguistics, Univ. of California, Berkeley
1978 M.A., East Asian linguistics, Univ. of California, Berkeley
1984 Ph.D., linguistics, Cornell University, Ithaca, NY
1984 Postdoctoral Researcher, AT&T Bell Labs, Murray Hill, NJ
1985 Professor, Departments of Linguistics and Speech and Hearing
Sciences, Ohio State University, Columbus, OH
1994 Visiting Scholar, ATR Interpreting Telephony Research Labora-
tories, Kyoto, Japan
2000 Senior Research Fellow, Macquarie Centre for Cognitive
Sciences, Macquarie Univeristy, Sydney, Australia
OSAMU FUJIMURA (see SSSHP University of Tokyo file)
SUSAN R. HERTZ (See SSSHP USA Eloquent Technology, Inc.)
HUFFMAN, M. K. (See SSSHP USA Eloquent Technology, Inc.)
------------------------------------------------------------- Top
CONTRIBUTIONS AND REVIEW BY:
Dr. Susan R. Hertz
Cornell University and SpeechWorks International, Inc.
(Quoted material is from a personal communication from Susan R.
Hertz to H.D. Maxey, January 23, 2002, for this history. See
SSSHP USA Cornell University file.)
|