Potter et al. (1947) collected sets of spectrograms depicting all of
the vowels and consonants of English, and suggested ways in which to
interpret the patterns they observed. They created a terminology that
included terms in use today such as "stop gap" and "voice bar." In
attempting to extract a common property for a stop consonant before
different vowels, they defined the concept of the "hub." The "hub"
is the ideal value for the second formant in each consonant. According
to their observations, the second formant hub was quite useful in
distinguishing between consonants having different places of
articulation in English (e.g., /b/ vs /d/ vs /g/). The authors
observed a fairly constant hub for /b/ before different vowels, see
examples in
Fig. 15, 3 and for
/d/, but they said the hub for /g/ was variable across vowel context.
The investigation of the perceptual importance of various acoustic
cues to a given phonetic contrast began with the use of the Pattern
Playback machine at Haskins Laboratories (Cooper et al., 1951).
Delattre, Liberman, Cooper, and their associates created stylized
versions of syllables in an effort to determine the acoustic cues
sufficient for the synthesis of selected phonetic contrasts. This
extensive line of research culminated in a publication suggesting
explicit rules for the synthesis of English speech sounds, in which
Frances Ingemann collected together a body of "synthesis-by-art"
knowledge that was based on experience with the Pattern Playback
(Liberman et al., 1959).
The research suggested the importance of formant frequencies,
formant frequency motions, spectral peaks in noise bursts, and
the relative timing of onsets in different frequency regions as
cues for voicing, manner, and place of articulation of consonants.
The researchers emphasized the encoded nature of speech (Liberman
et al., 1967) in that the acoustic cues to the identity of a phoneme
were spread out in time so as to overlap with cues for adjacent
phonemes, and the cues were context dependent -- for example the same
plosive burst spectrum was heard as a different consonant depending
on the vowel pattern that followed (Cooper et al., 1952). There
appeared to be no one invariant acoustic cue signaling the presence
of a given stop consonant; rather the consonantal identity would
have to be inferred from the formant transitions into an adjacent
vowel. The most interesting descriptive solution to this perceptual
paradox was the locus theory (Delattre et al., 1955), which
characterized the onset frequency of the second formant motion for
a consonant-vowel transition in terms of an invisible consonant locus.
The locus was determined by extrapolating backward about 50 ms from
observed formant transitions for a given consonant before various
vowels, Fig. 16. The importance of a virtual
|