then single letters were converted to phonemic form. For each letter,
rules were ordered so that the first rules treated special cases of
complex environmental specification, and the last case was always a
default phonemic correspondence. For example, a rule might say that
the letter A => /e/ if followed by VE. The rule treats correctly words
like "behave," but not "have." A slightly more complicated variant of
the same idea was to convert consonants first (Hunnicutt, 1976). This
permitted the phonemic representations of consonants to be used in the
context specifications for the more difficult conversion of vowels.
Systems of this kind may have more than 500 such rules for the
interpretation of letter strings.
Several major problems were immediately apparent: (1) vowel conversion
depended in part on stress pattern, (2) correct analysis often required
detection of morpheme boundaries, and (3) letter contexts had
structural properties such as VC vs VCC that one would rather refer
to instead of enumerating all possible letter sequences. Before
discussing how these and the next generation of spelling conversion
programs dealt with these issues, we consider a novel approach to the
problem that has received considerable attention in the artificial
intelligence community.
The conversion of letters to phonemes might appear to be a pattern
matching problem amenable to statistical learning strategies. For
example, Sejnowski and Rosenberg (1986) considered the problem of
creating a network, which they called NETtalk, that takes a seven-letter
window as input and outputs the phoneme corresponding to the middle
letter. A set of 120 "hidden" neuron-like threshold elements mediated
between input neurons corresponding to 29 possible letters at each of
seven positions and an output set of neurons representing about 40
phonemes and two degrees of stress. The weighting of input connections
and output connections of the hidden units was initially random, but
was adjusted through incremental training on a 20 000 word phonemic
dictionary. When evaluated on the words of this training set, the
network was correct for about 90% of the phonemes and stress patterns.
In some sense, this is a surprisingly good result in that so much
knowledge could be embedded in a moderate number of about 25 000
weights, but the performance is not nearly as accurate as that of a
good set of letter-to-sound rules (performing without use of an
exceptions dictionary, but with rules for recognizing common affixes).
A typical knowledge-based rule system (Bernstein and Pisoni, 1980)
is claimed to perform at about
|