SSSHP Contents | Labs

 KLATT 1987, p. 739 
Go to Page | Contents A. Linguistic framework | Index | Bibl. | Page- | Page+
 

locations of these types of boundaries. Each syllable of a word in a sentence can be assigned a strength or stress level. Differences in assigned stress make some syllables stand out from the others. The stress pattern has an effect on the durations of sounds and on the pitch changes over an utterance (fundamental frequency of vocal cord vibrations, or fo). The phonological component of the grammar converts phonemic representations and information about stress and boundary types into (1) a string of phonetic segments plus (2) a superimposed pattern of timing, intensity, and fo motions -- the latter three aspects being known as sentence prosody.

In mapping phonemes into sound, traditional linguists recognize a second intermediate level of representation that has been termed the phonetic segment or allophone. For an extreme example, the phoneme /t/ may be replaced by one of six distinctly different allophones, which will be described later in Fig. 27. The phonological component of the grammar includes rules to make these substitutions, either by replacing one symbol by another, or by changing the feature representation of a phoneme. The theoretical status of a phonetic level of representation (can it adequately describe individual languages and speaker behavior while simultaneously being capable of representing details in all human languages) is in some dispute, but since the text-to-speech algorithms follow allophonic substitution by other rules to make graded changes to segments, these theoretical questions are of less concern.

Unfortunately, most generative linguists have concentrated their efforts on developing rules and representational systems for the upper components of Fig. 2, and have left much of the detail concerned with articulation (feature implementation) and conversion to sound unspecified. Nevertheless, text-to-speech systems have benefited from attempts to follow this schema, and incorporate as many published phonetic details as possible within their algorithms, as we will see.

I. PHONEMES-TO-SPEECH CONVERSION

As suggested by Fig. 2, many steps are required in order to convert a phoneme string -- supplemented by lexical stress, syntactic, and semantic information -- into an acoustic waveform. An overview of these transformations is most easily provided by describing examples taken directly from the Klattalk algorithms (Klatt, 1982a). For example, the phonemes, stress, and syntactic symbols shown at the top in Fig. 3 for the utterance "Joe ate his soup" are first converted into allophones. Following the usual convention, Fig. 3 representations surround phonemes by slashes, and place square brackets around a phonetic string. Phonological rules modify three of the phonemes in this example. The /h/ of unstressed "his," being unstressed, is deleted, which then causes the /t/ of "ate" to become a flap. Finally, the postvocalic
 

Go to Page | Contents I. Phonemes-to-speech | Index | Bibl. | Page- | Page+

 KLATT 1987, p. 739 
SSSHP Contents | Labs
Smithsonian Speech Synthesis History Project
National Museum of American History | Archives Center
Smithsonian Institution | Privacy | Terms of Use