|KLATT 1987, p. 770|
then single letters were converted to phonemic form. For each letter, rules were ordered so that the first rules treated special cases of complex environmental specification, and the last case was always a default phonemic correspondence. For example, a rule might say that the letter A => /e/ if followed by VE. The rule treats correctly words like "behave," but not "have." A slightly more complicated variant of the same idea was to convert consonants first (Hunnicutt, 1976). This permitted the phonemic representations of consonants to be used in the context specifications for the more difficult conversion of vowels. Systems of this kind may have more than 500 such rules for the interpretation of letter strings.
Several major problems were immediately apparent: (1) vowel conversion depended in part on stress pattern, (2) correct analysis often required detection of morpheme boundaries, and (3) letter contexts had structural properties such as VC vs VCC that one would rather refer to instead of enumerating all possible letter sequences. Before discussing how these and the next generation of spelling conversion programs dealt with these issues, we consider a novel approach to the problem that has received considerable attention in the artificial intelligence community.
The conversion of letters to phonemes might appear to be a pattern
matching problem amenable to statistical learning strategies. For
example, Sejnowski and Rosenberg (1986) considered the problem of
creating a network, which they called NETtalk, that takes a seven-letter
window as input and outputs the phoneme corresponding to the middle
letter. A set of 120 "hidden" neuron-like threshold elements mediated
between input neurons corresponding to 29 possible letters at each of
seven positions and an output set of neurons representing about 40
phonemes and two degrees of stress. The weighting of input connections
and output connections of the hidden units was initially random, but
was adjusted through incremental training on a 20 000 word phonemic
dictionary. When evaluated on the words of this training set, the
network was correct for about 90% of the phonemes and stress patterns.
In some sense, this is a surprisingly good result in that so much
knowledge could be embedded in a moderate number of about 25 000
weights, but the performance is not nearly as accurate as that of a
good set of letter-to-sound rules (performing without use of an
exceptions dictionary, but with rules for recognizing common affixes).
A typical knowledge-based rule system (Bernstein and Pisoni, 1980)
is claimed to perform at about
|KLATT 1987, p. 770|