NMAH | Smithsonian Speech Synthesis History Project (ss

ELOQUENT TECHNOLOGY, INC. (ETI)


1988 24 Highgate Circle, Ithaca, NY 14850
1995 2389 N. Triphammer Rd., Ithaca, NY  14850 

2001 SpeechWorks International, Inc.


CONTENTS:

History

General/Surveys

The Delta System (1983-1995)

Synthesis of American English Dialects (1989-1995)

The Phone-and-Transition Model of Speech Timing (1990-1992)

The Syllt Program for Generating Synthetic Stimuli (1993)

Multi-Voice Speech Synthesis (1993-1998)

Voice Quality (1997)

Multi-Dialect and Multi-Language Speech Synthesis (1997-2001)

Biographies


------------------------------------------------------------- Top
HISTORY

Eloquent Technology, Inc. (ETI), a company exclusively focused on 
the development and marketing of multi-voice, multi-language text-
to-speech systems, was an outgrowth of the research conducted by 
Susan R. Hertz at Cornell University between 1974 and 1983 (see 
SSSHP USA Cornell University and A Personal Narrative by Susan
Hertz).  In 1983, Hertz started doing business as "Eloquent Tech-
nology," focusing on the development of the Delta System, a sophis-
ticated research and development tool for expressing and testing 
programs that convert text to speech. Hertz worked on the Delta 
System with two part-time consultants. In 1988, Eloquent Technology 
was incorporated, and three full-time employees were hired, who 
worked out of Hertz's home.  In 1995, after the company had grown 
to a size of six, it moved from Hertz's house to an outside office, 
eventually increasing in size to seventeen full-time employees.  

Between 1988 and 2001, the company worked on a variety of research 
projects in the areas of multi-voice, multi-dialect, and multi-
language speech synthesis by rule.  The particular linguistic models 
developed, which included language-universal and dialect-universal 
components, form the basis for the ETI-Eloquence text-to-speech 
system, which has been marketed since 1995, and at the time of this 
writing, is available for twelve languages on multiple computer plat-
forms, including small hand-held devices.

In August 1996, the company formed a strategic partnership with IBM, 
which acquired certain portions of the technology developed by ETI, 
and ultimately incorporated it into its ViaVoice line of speech pro-
ducts.  See A Personal Narrative by Susan Hertz for a more complete 
history of Eloquent Technology, Inc.  In January 2001, Eloquent Tech-
nology, Inc. merged with SpeechWorks International, Inc., which is 
continuing to market the ETI-Eloquence system, and is also combining 
portions of it with SpeechWorks' concatenative text-to-speech system, 
Speechify, and other speech technology.


------------------------------------------------------------- Top
GENERAL/SURVEYS 


1997 Hertz, S.R., "The Technology of Text-to-Speech", Speech Tech-
     nology, CI Publishing, 18-21.


2000 Hertz, S.R., Younes, R.J., and Hoskins, S.R., "Space, Speed, 
     Quality, and Flexibility: Advantages of Rule-Based Speech Syn-
     thesis", Conference Proceedings, AVIOS 2000, May 22-24, 2000, 
     San Jose, CA, 217-227. (2000) (Copy in SSSHP USA Eloquent 
     Technology, Inc. file.)


------------------------------------------------------------- Top
PROJECT:  THE DELTA SYSTEM (1983-1995)


The Delta System is a software tool for text-to-speech rule develop-
ment for any human language.  The Delta System includes a special 
programming language and interactive environment specifically de-
signed to build and manipulate a multi-tiered utterance represen-
tation called a delta.  In a delta, the relationships between all 
relevant (user-definable) abstract linguistic units (e.g., phrases, 
words, syllables, phonemes) as well as quantitative phonetic values 
(e.g., formant frequencies, amplitudes, durations) can be explicitly 
represented.  The DeltaTools interactive environment can be used to 
trace program (rule) execution and to experiment by listening to the 
synthetic speech output produced with different acoustic values.


1985 Hertz, S.R., Kadin, J. and Karplus, K., "The Delta rule develop-
     ment system for speech synthesis from text", Proceedings of the 
     IEEE, 73, Special Issue on Man-Machine Speech Communication, 
     1589-1601. (1985) (Copy in SSSHP USA Eloquent Technology, Inc. 
     file.)


1990 Hertz, S.R., "The Delta programming language: an integrated 
     approach to non-linear phonology, phonetics, and speech synthe-
     sis", in J. Kingston and M. Beckman (eds.), Papers in Labora-
     tory Phonology I: Between the Grammar and the Physics of Speech, 
     Cambridge University Press. (1990) (Copy in SSSHP USA Eloquent 
     Technology, Inc. file.)


------------------------------------------------------------- Top
PROJECT: SYNTHESIS OF AMERICAN ENGLISH DIALECTS (1989-1995)


Between 1989 and 1995, we worked on a variety of projects involving 
the synthesis of American English dialects, including a dialect of 
Black English, General American, Brooklyn, Boston, and Alabama.  In 
this project, we aimed to further test the validity of our nucleus-
based phone-and-transition model of speech timing (see Phone-and-
Transition Model of Speech Timing, below), and to extract dialect-
universal generalizations for multi-dialect speech synthesis by rule.  
Toward these ends, we developed a multi-dialect relational database 
that contained both higher-level linguistic information and detailed 
spectral and durational information about formants, voicing, frica-
tion, and aspiration for a variety of utterance types in each dialect.  
From information in the database, multi-tiered utterance representa-
tions could be automatically generated for synthesis with the Delta 
System (above.)


1990 Hertz, S.R., "A modular approach to multi-dialect and multi-
     language speech synthesis using the Delta System", Proceedings 
     of the Workshop on Speech Synthesis, European Speech Communica-
     tion Association, Autrans, France, 225-228. (Copy in SSSHP USA 
     Eloquent Technology, Inc. file.)


1994 Hertz, S.R., Zsiga, E.C., de Jong, K.J., Gries, P., Lockwood, 
     K.E. "From database to speech: a multi-dialect relational data-
     base integrated with the ETI-Eloquence synthesis technology, 
     Conference Proceedings of the Second ESCA/IEEE Workshop on 
     Speech Synthesis, 45-48. (Copy in SSSHP USA Eloquent Technology, 
     Inc. file.)

     SSSHP 172.1 Tape: "Eloquent Technology, Inc. speech synthesis
          samples"
          (syn "The colors of the rainbow are red, orange, yellow, 
           green, blue and violet.") Southern American English.
          Cassette, good quality.

	
------------------------------------------------------------- Top
PROJECT: PHONE-AND-TRANSITION MODEL OF SPEECH TIMING (1990-1992)


Building on our observations about formant timing patterns during 
our earlier rule-based multi-language speech synthesis research 
between 1978 and 1990, we developed a new model of speech timing, 
called the phone-and-transition model.  The phone-and-transition 
model is based on a segmentation of speech into independent phone 
and formant transition units, rather than abutted phoneme-sized 
units that incorporate the transitions, as in more conventional 
models for rule-based synthesis.  The separate phone and transition 
units are grouped into higher level units, such as phonemes, syl-
lable nuclei, and syllables.  The model was derived from observa-
tions of the durational behavior of sonorants before voiced and 
voiceless obstruents, diphthongs in fast and slow speech, and the 
timing of aspiration.  In addition to more straightforward expres-
sion of timing patterns in a particular language, the model has 
made possible the direct expression of a variety of acoustic uni-
versals, and it has helped us increase our understanding of the 
relationship between phonology and phonetics.


1991 Hertz, S.R., "Streams, phones, and transitions: toward a 
     phonological and phonetic model of formant timing", Journal 
     of Phonetics, 19, Special Issue on Speech Synthesis and Pho-
     netics, edited by R. Carlson. (1991) (Copy in SSSHP USA 
     Eloquent Technology, Inc. file.)


1992 Hertz, S.R. and Huffman, M.K., "A nucleus-based timing model 
     applied to multi-dialect speech synthesis by rule", Proceed-
     ings of the International Conference on Spoken Language Pro-
     cessing, 2, 1171-1174. (1992) (Copy in SSSHP USA Eloquent 
     Technology, Inc. file.)


------------------------------------------------------------- Top
PROJECT: THE SYLLT PROGRAM FOR GENERATING SYNTHETIC STIMULI (1993)


Syllt (Syllable Tool) is a partial phone-to-speech program designed 
to be used in conjunction with the Delta System (above) for teaching 
and research in acoustic phonetics and speech synthesis. Written in 
the Delta programming language, Syllt takes a string of phonetic 
symbols representing a CVC (consonant-vowel-consonant), CV, VC, or 
VCV utterance as input, and creates a multi-tiered utterance repre-
sentation (delta) containing the phonological and acoustic structure 
of the utterance as output.  From the acoustic values, parameter 
values for a Klatt-style formant synthesizer are automatically 
derived.  The output deltas can be modified either interactively 
with simple Delta System commands, or automatically with built-in or 
user-defined Delta language procedures.  Syllt can also quickly 
implement stepwise changes to a delta to generate stimulus continua 
or matrices.


1995 Hertz, S.R. and Zsiga, L., "The Delta System with Syllt: 
     Increased capabilities for teaching and research in phonetics", 
     Proceedings ICPhS 95 Stockholm, 2, 322-325. (Copy in SSSHP USA 
     Eloquent Technology, Inc. file.)


------------------------------------------------------------- Top
PROJECT:  MULTI-VOICE SPEECH SYNTHESIS (1993-1998)


Between 1994 and 1998, we added to our formant-based synthesis rule 
sets for different languages a universal "voice filter" component 
that operates on the acoustic parameter values produced by the rules 
to generate the desired voice quality, including male, female, and 
child.  A variety of parameters can be set to modify the male, 
female, and child voices to produce a virtually limitless set of 
voices. These parameters include breathiness, roughness, speed, 
volume, pitch baseline, and degree of pitch fluctuation. 

     SSSHP 172.4 Tape: "Eloquent Technology, Inc. speech synthesis
          samples"
          (syn 3:11 "Hello. My name is Reed. I believe we may have 
           met before. Would you like to meet my sister Shelley? 
           ...  Goodbye, and thanks for listening.")
          Cassette, good quality.


------------------------------------------------------------- Top
PROJECT:  VOICE QUALITY (1997)


With the aim of improving the naturalness of our formant-based 
synthesis by rule, we conducted experiments in which we hand-
modified certain rule-generated parameter values related to voice 
quality and prosody, such as spectral tilt and fundamental 
frequency.  Highly natural-sounding speech was achieved for the 
target sentences. Through our experimentation, we determined that 
we could abstract away from certain details of the original model 
utterances, such as some formant target misalignments, without 
degrading the speech quality.  In particular, we were able to 
structure the synthetic utterances in accordance with the phone-
and-transition model underlying our rules, suggesting that rule-
based formant synthesis within this model has the potential to 
sound highly natural.
     
     SSSHP 172.2 Tape: "Eloquent Technology, Inc. speech synthesis
          samples"
          (syn 0:01 "Today's a spectacular day.")
          Cassette, good quality.


------------------------------------------------------------- Top
PROJECT: MULTI-DIALECT AND MULTI-LANGUAGE SPEECH SYNTHESIS 
            (1997-2001)


Between late 1996 and 2001, we used the Delta System (above) to 
develop text-to-speech synthesis rules for thirteen languages/
dialects -- German, Parisian and Canadian French, Castilian and 
Mexican Spanish, General American and British English, Finnish, 
Brazilian Portuguese, Mandarin Chinese, Japanese, and Korean. 
(The Romanization portions of Chinese, Korean, and Japanese, 
which generate a Romanized string on the basis of the original 
input characters, were implemented in C++, rather than Delta).  
Building on the linguistic models we developed between 1990 and 
1996 (see Synthesis of American English Dialects and The Phone-
and-Transition Model of Speech Timing, above), the rules for each 
language are divided into language-universal, language-specific 
(dialect-universal), and dialect-specific rule modules. 

 
1990 Hertz, S.R., "A modular approach to multi-dialect and multi-
     language speech synthesis using the Delta System", Proceedings 
     of the Workshop on Speech Synthesis, European Speech Communi-
     cation Association, Autrans, France, 225-228. (Copy in SSSHP 
     USA Eloquent Technology, Inc. file.)


1999 Hertz, S.R., Younes, R.J., and Zinovieva, N., "Language-uni-
     versal and language-specific components in the multi-language 
     ETI-Eloquence text-to-speech system", Proceedings of the XIV 
     International Congress of Phonetic Sciences, San Francisco, 
     CA, Aug. 1-7, 2283-2286. (1999) (Copy in SSSHP USA Eloquent 
     Technology, Inc. file.)

     SSSHP 172.3 Tape: "Eloquent Technology, Inc. speech synthesis
          samples"
          (syn 1:46 "Hi, my name is Reed. I'm an American. I speak 
           English ... \Vce=Speaker=Antti\ Hei. Nimeni on Antti.  
           Olen suomalainen. Puhutko suomea?  1, 2, 3, 4, 5, 6, 7, 
           8, 9, 10.")
          Cassette, good quality.


------------------------------------------------------------- Top
BIOGRAPHIES


KENNETH DE JONG

1984 B.A., English, Calvin College, Grand Rapids, MI
1987 M.A., linguistics, Ohio State University, Columbus, OH
1991 Ph.D., linguistics, Ohio State University, Columbus, OH
1991 Postdoctoral Fellow, Phonetics Lab, Univ. of California, 
     Los Angeles, CA
1992 Visiting Asst. Professor, Department of Linguistics, 
     University of California, Los Angeles, CA
1993 Visiting Scholar, Department of Modern Languages and 
     Linguistics, Cornell University, Ithaca, NY
1993 Research Linguist, Eloquent Technology, Inc., Ithaca, NY
1994 Visiting Asst. Professor, Department of Linguistics, 
     Indiana University, Bloomington, IN
1995 Asst. Professor, Department of Linguistics, Indiana 
     University, Bloomington, IN


PAUL GRIES


SUSAN R. HERTZ

1972 B.A., linguistics and German, Univ. of California, Davis
1974 SRS tool and rule development
1975 M.A., general linguistics, computer science, and Germanic 
     linguistics, Cornell University, Ithaca, NY
1979 Ph.D., linguistics, Cornell University, Ithaca, NY
1979 Acting Assistant Professor, Department of Modern Languages 
     and Linguistics, Cornell University, Ithaca, NY
1983 Delta System tool and rule development; ETI-Eloquence 
     product development
1983 President and CTO, Eloquent Technology, Inc.
1985 Acting Assistant Professor, Department of Modern Languages 
     and Linguistics, Cornell University, Ithaca, NY
1986 Senior Research Associate, Department of Modern Languages 
     and Linguistics, Cornell University, Ithaca, NY
1996 Adjunct Associate Professor, Department of Linguistics, 
     Cornell University, Ithaca, NY
2001 Director and Lead Scientist, Text-to-Speech Technologies, 
     SpeechWorks International, Inc., Ithaca, NY


STEVE R. HOSKINS

1982 B.E.E., Univ. of Delaware, Newark, DE
1982 Electrical/Software Engineer for Raytheon Equipment Div., 
     Fenwal Electronics, Allen Bradley
1991 Research/Teaching Assistant, Linguistics Department, Univ. 
     of Delaware, Newark
1993 M.A., linguistics, Univ. of Delaware, Newark
1994 Research Assistant, Applied Sciences and Engineering 
     Laboratories, Wilmington, DE
1997 Ph.D., linguistics, Univ. of Delaware, Newark
1997 Postdoctoral Researcher, Applied Sciences and Engineering 
     Laboratories, Wilmington, DE
1999 Computational Linguist, Eloquent Technology, Inc., Ithaca
1999 Visiting Scholar, Linguistics Department, Cornell Univ., 
     Ithaca, NY
2001 Text-to-Speech Scientist and Software Developer, SpeechWorks 
     International, Inc.


MARIE K. HUFFMAN

1982 B.A., linguistics, Univ. of California, Riverside, CA
1985 M.A., linguistics, Univ. of California, Riverside, CA
1989 Ph.D., linguistics, Univ. of California, Los Angeles, CA
1989 Postdoctoral Fellow, Speech Communication group, Massa-
     chusetts Institute of Technology, MA
1991 Visiting Scholar, Dept. of Modern Languages and Linguis-
     tics, Cornell University, Ithaca, NY
1991 Research Linguist, Eloquent Technology, Inc., Ithaca, NY
1993 Asst. Professor, Dept. of Linguistics, State University of 
     New York, Stony Brook, NY
1999 Assoc. Professor, Dept. of Linguistics, State University of 
     New York, Stony Brook, NY


JAMES A. KADIN

1978 B.A., mathematics, Ithaca College, Ithaca, NY
1981 M.S., computer science, Cornell University, Ithaca, NY
1983 Delta System development, Eloquent Technology, Inc., Ithaca
1988 Ph.D., computer science, Cornell University, Ithaca, NY
1989 Asst. Professor, Dept. of Computer Science, University of 
     Maine, Orono, ME
1994 Software Engineering Manager, Alacare Home Health Services, 
     Inc., Birmingham, AL
1996 Research Scientist, Mouse Genome Informatics, The Jackson 
     Laboratory, Bar Harbor, ME


KEVIN KARPLUS

1974 B.A., mathematics, Michigan State University, MI
1976 M.S., mathematics, Stanford University, CA
1983 Ph.D., computer science, Stanford University, CA


KATHERINE E. LOCKWOOD

1993 B.A., linguistics, Cornell University, Ithaca, NY
1993 English dialect rule development and English dialect 
     research, Eloquent Technology, Inc., Ithaca, NY
2000 M.S., speech-language pathology, Ithaca College, Ithaca, NY
2001 Speech Therapist, Franziska Racker Centers, Ithaca, NY


REBECCA J. YOUNES

1976 Princeton University
1979 B.A., linguistics, University of Minnesota, Minneapolis
1982 M.A., linguistics, The University of Texas at Austin, TX
1982 Center for Arabic Study Abroad, Cairo, Egypt
1982 Instructor, Dept. of English, Birzeit University, Birzeit, 
     The West Bank
1996 Computational Linguist, Eloquent Technology, Inc., Ithaca
2001 Text-to-Speech Scientist and Software Developer, SpeechWorks 
     International, Inc., Ithaca


NINA ZINOVIEVA

1976 M.A., linguistics, Philological Faculty, Moscow State 
     Lomonosov University 
1976 Researcher, Laboratory of Phonetics and Speech Communication, 
     Philological Faculty, Moscow State Lomonosov University
1986 Ph.D., linguistics, Philological Faculty, Moscow State 
     Lomonosov University
1994 Senior Researcher, Accent Inc., CA
1997 Computational Linguist, Eloquent Technology, Inc., Ithaca
1999 Computational Linguist, Lernout & Hauspie, Burlington, MA
2001 Senior Voice User Interface Engineer, Comverse Technology, 
     Cambridge, MA


ELIZABETH ZSIGA

1986 B.A., linguistics, Wesleyan University, Middletown, CT
1988 M.A., linguistics, Yale University, New Haven, CT
1992 Lecturer, Department of Linguistics, Yale University, 
     New Haven, CT
1993 Ph.D. in linguistics, Yale University, New Haven, CT
1993 Research Linguist, Eloquent Technology, Inc., Ithaca
1994 Asst. Professor, Dept. of Linguistics, Georgetown 
     University, Washington, DC
1999 Assoc. Professor, Dept of Linguistics, Georgetown 
     University, Washington, DC


------------------------------------------------------------- Top
CONTRIBUTIONS AND REVIEW BY:

Dr. Susan R. Hertz (2001)
Cornell University and SpeechWorks International, Inc. 

(Quoted material is from a personal communication from Susan R. 
Hertz to H.D. Maxey, January 23, 2002, for this history. See 
SSSHP USA Eloquent Technology, Inc. file.)
	SSSHP Contents \| Labs \| Abbr. \| Index

Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use