phone, while FA is
the target value of this formant stored for the
adjacent phone. Hence, the character of the transition depends mainly
on variables stored for the ranking phone. Thus, each phone has within
its boundaries an initial transition, influenced by the previous phone,
and a final transition, influenced by the following phone. A duration
is stored for each phone; if it is greater than the sum of the durations
of the initial and final transitions calculated for the phone, the
target values are used for the steady state portion. If the duration
is less than the sum, and the paths of the calculated transitions fail
to intersect, they are replaced by a linear interpolation between the
initial and final boundary values. But if the paths do intersect, the
values for each transition between the boundary value and the
intersection are used, and the others discarded. Thus the formants of
shorter vowels do not attain their targets; their frequencies are
context-dependent, as in natural speech (Shearme and Holmes 1962;
Lindblom 1963).
Denes (1970) uses a similar scheme, the boundary values being dependent
on the target values and on a weight assigned to each phone. Our own
system (Mattingly 1968a, b) also uses a scheme like that of Holmes
et al., except that interpolation is done according to a simple
non-linear equation which assures that formants curve sharply near
boundaries. The formant transitions in Rabiner's (1967) system, the
most serious attempt to simulate natural formant motion, are calculated
according to a critically damped second degree differential equation.
The manner in which a formant moves from its initial position towards
the next target depends on a time constant of the equation, which is
specified for each formant and each possible pair of adjacent phones.
When all formants have arrived within a certain distance of the current
target, they start to move toward the following target, unless a delay
(permitting closer approximation or attainment of the target) is
specified. It is not obvious that schemes for non-linear motion offer
any great advantage over linear schemes. While a non-linear rule results
in formant movements which are more naturalistic, they do not seem to
be necessarily perceptually superior to, or even distinguishable from,
linear movements. If the formant moves between appropriate frequencies
over an appropriate time-period, the manner of its motion does not
seem to be too important.
In Rao and Thosar's (1967) system, each phone is characterized by a
set of 'attributes', i.e. features of a sort. A phone is either a vowel
or a consonant; vowels are front or back; consonants are stops or
fricatives; voiced or unvoiced; labial, dental or palatal. Transition
patterns depend on these attributes and on the duration and
steady-state spectral values stored for each phone. Vowel-vowel
transitions are linear from steady state to steady state, and the
two temporal variables -- total transition time and the fraction of
the total within the duration of the earlier vowel -- are the same for
all pairs of vowels. For consonant-vowel transition, the boundary value
for each formant is equal to F(FL) +
(1-F)FV, where FV is the target
frequency of the vowel, FL is the consonant locus frequency and F is
a weighting factor. Transition time and F1 locus depend on the value
|