NMAH | Smithsonian Speech Synthesis History Project (dk

certain syntactic boundaries, but can also be specified by the user, (5) one permitted special level of phrasal emphasis and two levels of lexical stress are introduced (plus the additional alternatives unstressed and reduced), (6) syllable boundary and morpheme boundary symbols are provided, but are usually not required since rules assign consonants to syllables correctly in most cases anyway, (7) the compound noun symbol is introduced in order to be able to force stress reduction in the second element of the compound, (8) a very limited inventory of syntactic symbols is provided, and (9) the new paragraph symbol is defined and used to realize prosodic marking of new paragraphs. A clear deficiency of the present symbol inventory is the lack of any ability to approximate non-English sounds in foreign words, although this limitation is shared by many English speaking individuals.

Historically, English and most other languages employing an alphabetical spelling representation began with a system that was close to the way the word was pronounced (Venezky, 1965, 1970; Chomsky and Halle, 1968; Henderson, 1982). Over time, pronunciation habits changed, sometimes dramatically, so that the spelling reflects more nearly an underlying historical antecedent of current pronunciation instead of the synchronic phonemes. Thus rules for pronunciation of English words depend on complex conventions involving, e.g., remote silent "e," the number of consonants following a vowel, the grouping together of special letter pairs, such as "ch" and "gh," which normally function like a single letter (Wijk, 1969), but not if in separate morphemes, etc. English has also borrowed words from other languages, so that Latin, French, German, and other patterns, somewhat Anglicized, are fairly common.

A selected survey of the literature on derivation of phonemes and stress from orthography is presented in Fig. 31 as a block diagram. Interconnections indicate how fundamental theoretical analyses of English have been incorporated in laboratory programs for text analysis and, finally, in commercial text-to-speech systems. As indicated in the figure, methods used in most commercial systems for deriving a phonemic representation of a word involve the use of letter-to-sound rules and an exceptions dictionary. An attractive alternative, as we will see, is to develop a large morpheme dictionary and try to decompose each input word into its constituent morphemes (where morphemes are the minimal meaningful subparts of words).

Several initial attempts to predict word pronunciation just from the spelling (Ainsworth, 1973; McIlroy, 1974; Hunnicutt, 1976; Elovitz et al., 1976; Carlson and Granström, 1976) started from the assumption that a letter or letter pair could be converted to the appropriate phoneme if just the right amount of adjacent letter context was examined. Based on this view, a set of conversion rules was devised to take care of letter pairs such as "ch" and "ea," and

	SSSHP Contents \| Labs
Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use