SSSHP Contents | Labs

 KLATT 1987, p. 753 
Go to Page | Contents C. Segmental synthesis | Index | Bibl. | Page- | Page+
 

three-formant synthesizer that was excited by either an impulse train or a noise source, and so were somewhat limited in their ability to control formant amplitudes or to approximate voiced fricatives. Nevertheless, surprisingly good speech quality was produced by rule (example 16 of the Appendix) (with the caveat that durations and fundamental frequency contour were copied from natural speech, some hand-editing of rule output was permitted, and a familiar passage was spoken). Details of the program were never published, but rules appear to have been based on Gerstman's considerable experience with the Haskins Laboratories group (Mattingly, 1968, pp. 40-42).

Shortly thereafter, another system, both elegant in its simplicity and remarkable in its performance was created by Holmes et al. (1964). This publication contains a description of a parallel formant synthesizer and a complete listing of the rules and tables for synthesizing British English phonemes. The authors used a fairly simple parameter generation algorithm, whose operation was determined entirely by values in tables. A ranking procedure implemented a version of the locus theory, and allowed consonantal formant transitions to impinge on vowel target frequencies in such a way that formant undershoot of the target occurred for short vowels, as illustrated in Fig. 19. The speech, quality and intelligibility of this pioneering program is remarkably good -- probably better than many of the inexpensive products now on the market (example 17 of the Appendix). Unfortunately, intelligibility data for the system were never collected.

An adaptation to American English, including rules for prediction of segment durations and fundamental frequency contours, was described by Mattingly (1966, 1968) (example 20 of the Appendix). Mattingly used formant transition curves that were "S-shaped" and thus more like natural data than are linear transitions, but he found there to be little if any perceptual difference between the two types of interpolation. Allophone rules were also added at this time to permit context-conditioned modifications to table values as needed.

The Mattingly rules were combined with a set of letter-to-sound rules and a 140 000-word Kenyon and Knott phonemic dictionary, obtained from June Shoup of the Speech Communication Research Laboratory, to create an experimental Haskins text-to-speech system (Cooper et al., 1973; Nye et al., 1973). The system, intended to be part of a reading machine for the blind, was tested for intelligibility and optimal speaking rate (example 26 of the Appendix). The data will be discussed and compared with data for other systems in Sec. IV. Unfortunately, this pioneering effort was not pursued due to a funding lapse (Cooper et al., 1984), and the device was never produced in quantity for the intended users.

Synthesis-by-rule programs proliferated during the late 1960s and early 1970s. Rabiner (1968) and Liljencrants (1969) investigated the advantages of using a critically damped second-order smoothing filter to constrain formant frequencies to move continuously in time, as required by acoustic theory. The smoothing time constant was varied depending on segmental characteristics in order to approximate the various rates of formant motion observed in natural speech. Rabiner's rules were able to synthesize CV and VC nonsense syllables with consonantal intelligibility of about 75%. However, when listening to recordings of a human subject producing consonant-vowel nonsense syllables, listeners are capable of much higher recognition performance, better than 99% correct (Pisoni and Hunnicutt, 1980). This represents an upper bound or goal for all rule programs attempting to synthesize speech.

Klatt (1970) extended this earlier work by formulating rules for generating CVC syllables with greater fidelity to measured characteristics of English consonants. Using a hybrid cascade/ parallel formant synthesizer (Klatt, 1980) and a rule program that allowed specification of targets and
 

Go to Page | Contents C. Segmental synthesis | Index | Bibl. | Page- | Page+

 KLATT 1987, p. 753 
SSSHP Contents | Labs
Smithsonian Speech Synthesis History Project
National Museum of American History | Archives Center
Smithsonian Institution | Privacy | Terms of Use