are matched. If these elements are naturally spoken words, the output
rate cannot exceed ordinary speaking rates without distortion, and
(on account of the difficulty of abutting the words closely without
losing intelligibility) will actually be considerably slower. Yet
the blind user would be happy with an output several times faster
than natural speech. A potential solution is suggested by the fact
that speech can be synthesized at two or three times normal speed
without much loss of intelligibility (Cooper et al. 1969).
In addition to these practical applications, synthetic speech is used
for psychological research into the nature of speech perception
itself. Synthetic stimuli can be produced which are simple and
closely controlled, and which in some cases could not have been
produced by a human speaker at all. Such stimuli have been used to
study categorical and continuous perception of speech sounds
(Liberman et al. 1957), differences between perception of
speech and non-speech signals (Liberman et al. 1961) and
hemispheric localization of speech processing (Shankweiler and
Studdert-Kennedy 1967). These investigations, apart from their
intrinsic interest, are an essential preliminary to synthesis by rule.
Another research application of synthesis is the close imitation
of natural speech utterances. It is of some interest to know just
how faithfully a particular synthesizer can simulate natural
speech, without any assumptions being made about the structure of
speech or language beyond those built into the synthesizer. Such
investigations explore the limitations of the synthesizer. If the
best imitations which could be achieved in this way were indeed
quite poor, this fact would discourage any endeavor making use of
synthetic speech. Fortunately, at least one investigator, Holmes
(1961; see also Holmes et al. 1964) using a formant synthesizer,
has been able to synthesize sentences which are extremely natural
and virtually impossible to distinguish from their originals. This
is good evidence that progress in other applications of speech
synthesis is at any rate not limited by the quality of the synthesizer.
2. HISTORICAL DEVELOPMENT OF SYNTHESIS BY RULE TECHNIQUES
The applications we have just summarized are quite recent. The
traditional motivation for research in speech synthesis has been
simply to explain how man used his vocal tract to produce connected
speech. In a broad sense, such research is synthesis by rule, though
it was a long time before the notion of a rule became obvious, and
the importance of an explicit formulation of the rules was recognized.
The idea of an artificial speaker is very old, an aspect of man's
long-standing fascination with humanoid automata. Gerbert (d. 1003),
Albertus Magnus (1198-1280) and Roger Bacon (1214-1294) are all said
to have built speaking heads (Wheatstone 1837). However, historically
attested speech synthesis begins with Wolfgang von Kempelen
(1734-1804), who published an account of his twenty years of research
in 1791 (see also Dudley and Tarnoczy 1950). Von Kempelen's
|