JOINT SPEECH RESEARCH UNIT (JSRU)
British Post Office
"The Joint Speech Research Unit was formed in 1956 by
amalgamation of the speech research interests of several U.K
government departments, with the main emphasis originally on
speech telecommunications for both civil and defense
applications. During its 30 year separate existence it was
administratively affiliated to a number of different departments,
starting with the Post Office (at that time responsible for the
national telephone system). During the 1960s the Unit expanded
to a total staff of about 40, but extended its responsibilities
to include development of secure speech communication equipment
for military and other government applications. However, in 1967
the Unit was split into two roughly equal parts, separating this
extra responsibility from the research functions. JSRU then once
more became entirely a research organization, with about 20
staff. This situation continued until the beginning of 1986,
when JSRU was amalgamated with a speech research group at the
Royal Signals and Radar Establishment (RSRE, in Ministry of
Defence) to form a new Speech Research Unit there. JSRU
therefore no longer exists as such. The last Head of JSRU is one
of the joint Heads of the new Unit."
SPEECH ANALYSIS RELEVANT TO SYNTHESIS (1960 - 1984)
"Speech analysis was an on-going activity in JSRU for the whole
period of its existence. Between 1960 and 1984 speech synthesis
was a main application of this analysis work for copy synthesis,
synthesis by rule, and for use in vocoders of various types. Some
of the highlights of this work are described in the publications
listed below, and consist mainly of inverse filtering, formant
analysis, and the development of a new all-digital hardware
spectrograph. From 1973 onwards, the formant analysis was based
on analysis-by-synthesis, and used a model of a parallel formant
synthesizer within the analysis program." (JNH 1989)
Shearme, J.N., and J.N. Holmes, "An experimental study of the
classification of speech sounds according to their
distribution in the formant 1-formant 2 plane", Proc. 4th
Int. Cong. Phonetic Sciences, Helsinki 1961, Mouton
and Co., 234-240. (1962) (I)|
Holmes, J.N., "An investigation of the volume velocity waveform
at the larynx during speech by means of an inverse
filter," Proc. 4th Int. Cong. Acoust., Copenhagen, paper
|1973||Holmes, J.N., and
E.M. Thornber, "Formant frequency measurement by waveform matching
during closed-glottis periods," Proc. British Acoust. Soc., 2,
paper 73SHB5. (1973)|
Holmes, J.N., "Formant excitation before and after glottal closure,"
Proc. IEEE Int. Conf. Acoustics Speech and Signal
Processing, Philadelphia, PA, 39-42. (1976)|
| Seeviour, P.M.,
J.N. Holmes and M.W. Judd, "Automatic generation of control signals
for a parallel-formant speech synthesizer," Proc. IEEE Int. Conf.
Acoustics Speech and Signal Processing, Philadelphia, PA, 690-693.
|1978||Holmes, J.N., M.W.
Judd and D.H. Walesby, "A high-quality all-digital sound spectrograph
developed for speech signal analysis," Proc. IEEE Int. Conf. Acoustics
Speech and Signal Processing, Tulsa, OK, 43-46. (1978)
|Hunt, M.J., J.S.
Bridle and J.N. Holmes, "Interactive digital inverse filtering and its
relation to linear prediction methods," Proc. IEEE Int. Conf. Acoustics
Speech and Signal Processing, Tulsa, OK, 15-18. (1978)
|1980|| Dupree, B.C., "The
use of tracking rules in automatic formant analysis of speech," J.
Acoust. Soc. Am., 68, S71. (1980)|
COPY-SYNTHESIS OF NATURAL UTTERANCES USING THE OVE-II CASCADE FORMANT SYNTHESIZER (1960 - 1961)
"This project was undertaken by John Holmes, when visiting the
Royal Institute of Technology in Stockholm. The aim was to see
whether formant synthesis was capable of achieving really
natural-sounding speech. One male sentence and one female
sentence were attempted, but the male one was much more
successful. The results for the male speech were judged at the
time by many people to be almost identical to the natural, but on
careful listening there were still many faults. Some of these
faults could be attributed to difficulties in generating the
control signals accurately enough using a conducting-ink analog
function generator, but others were almost certainly a result of
the synthesizer design." (JNH 1989)
|1961|| Holmes, J.N.,
"Research on speech synthesis carried out during a visit to the
Royal Institute of Technology, Stockholm, from November, 1960 to
March, 1961", Report JU 11-4, Joint Speech Research Unit, British
Post Office, Eastcote, Middlesex, England. (I,K)|
|Fant, C.G.M., J.
Martony, U. Rengman, A. Risberg and J.N. Holmes, "Recent progress in
formant synthesis of connected speech," J. Acoust. Soc. Am., 33, 384.
"The significance of the results of this project, compared with
later work using a parallel synthesizer, are discussed in:"
Holmes, J.N., "Synthesis of natural-sounding speech using a formant
synthesizer," FRONTIERS OF SPEECH COMMUNICATION RESEARCH, B. Lindblom
and S. Ohman (Eds.), Academic Press, London, 275-285. (1979)
COPY-SYNTHESIS OF NATURAL UTTERANCES USING A PARALLEL FORMANT SYNTHESIZER (1962 - 1983)
"As a result of the encouraging results from the work using OVE II, and also because of the limitations that were revealed during that project, work continued as a background task over many years on the use of a parallel-formant synthesizer to improve on the copy synthesis previously obtained with the cascade-connected OVE-II. This work led to a gradual evolution of the design of the synthesizer that is now widely known as the 'JSRU formant synthesizer', which has since been used also for experiments with speech synthesis by rule and formant vocoders. The synthesizer includes facilities for copying the main features of human voiced excitation pulses. When the control signals were iteratively optimized to copy natural spectral features of an utterance as closely as possible, most listeners found it almost impossible to detect any subjective differences between natural and synthetic speech when listening in very favourable conditions. Even when the control signals were derived automatically using the analysis method of Seeviour et al. (1976), the synthetic sentences were mostly acceptable as natural when heard in isolation, but were not as good as the hand-crafted copies.
This work led to the conclusion that for practical synthesis the parallel synthesizer was in fact more versatile than the cascade design, and caused no sacrifice of naturalness achievable, in spite of the traditional theoretical advantages widely associated with a cascade connection. Full detail of a software version of this synthesizer is described in an internal JSRU report, which includes a FORTRAN listing of the code. A DSP chip hardware version has been developed by Loughborough University of Technology with assistance from JSRU (Quarmby and Holmes, 1984).
No significant work on female speech was done during this project. Since 1986, however, groups elsewhere in the U.K. have achieved high-quality female synthesis using the same synthesizer, and work has also been done to extend the synthesizer bandwidth beyond the original upper limit of 4 KHz." (JNH 1989)
The USA National Security Agency (NSA) made an enhanced version of
the "JSRU format synthesizer" available to researchers. See SSSHP
USA NSA file.
Holmes, J.N., SPEECH SYNTHESIS, M&B Monograph EE/7, Mills and Boon
Ltd, London, 68 pages (1972). Spectrograms and time waveforms of
close copy of human speech, pp 44-47.|
|1973|| Holmes, J.N.,
"Influence of glottal waveforms on the naturalness of synthetic speech
from a parallel-formant synthesizer", IEEE Trans. Audio and
Electro-acoustics, AU-21, 298-305 (1973). Additional details on
Holmes, J.N., "Avoiding unwanted low-frequency level variations on
the output of a parallel-formant synthesizer," J. Acoust. Soc. Am.,
68, S18 (1980).|
Holmes, J.N., "Requirements for speech synthesis in the frequency range
3 - 4 kHz," Proc. F.A.S.E. Symposium on Acoustics and Speech, Venice, 1,
|1982||Rye, J.M., and
J.N. Holmes, "A versatile software parallel-formant speech synthesizer,"
JSRU Research Report No. 1016, Nov 1982.|
|1983|| Holmes, J.N.,
"Formant synthesizers: cascade or parallel?," Speech Communication, 2,
251-273. (1983) (K)|
|1984|| Quarmby, D.J., and
J.N. Holmes, "Implementation of a parallel-formant speech synthesiser
using a single-chip programmable signal processor," Proc. IEE, 131,
Part F, 563-569. (1984) (K)|
|1985|| Holmes, J.N.,
"A parallel-formant synthesizer for machine voice output," COMPUTER
SPEECH PROCESSING, F. Fallside and W.A. Woods (Eds.), Prentice-Hall,
London, 163-187. (1985)|
FORMANT SYNTHESIS BY RULE (1962 - 1966)
"Publication of the Haskins Laboratories' work using the Pattern Playback to determine the importance various perceptual cues for consonants in the late 1950's led JSRU to appreciate the value of rule synthesis as a research technique. This work was started by John Shearme using an analog hardware parallel formant synthesizer, and a conducting-ink function generator. Using these tools he succeeded in formulating useful rules for British English vowels, stops and fricatives, but the labour of converting the rules by hand onto conducting-ink traces and the difficulty of maintaining the apparatus in calibration caused progress to be very slow. Early in 1963 John Holmes devised a computer program structure that used a set of tables which compactly described all the Shearme rules, and he then extended the tables to include all the phonemes of the British RP dialect. The output of the rule program was a 5-channel punched paper tape, which was then mounted in an endless loop on a specially-developed high-speed tape reader. Each of the nine control signals used at that time was converted to a 32-level analog signal for feeding the synthesizer. This change avoided all the difficulties previously caused by the conducting-ink machine, as well as greatly speeding up rule development. This phase of the work only attempted rules for the phonetic segments, and duration and fundamental frequency data were derived from measurements of natural utterances.
From mid 1963 to mid 1964 JSRU received the benefit of having
Ignatius Mattingly as a guest worker. He assisted with refinement
of the phonetic rule tables, and used the same facilities to
create a prosodic rule system, whose input was phonetic text with
additional markings for accented syllables and punctuation." (JNH
|1963|| Holmes, J.N., and
J.N. Shearme, "Speech synthesis by rule controlled by a small low-speed
digital computer," J. Acoust. Soc. Am., 35, 1911. (1963)
|1964||Holmes, J. N.,
I. G. Mattingly and J. N. Shearme, "Speech synthesis by rule,"
Language and Speech, 7, 127-143. (1964) (B,I,K) Demo sentences are of
three types: (i) manual copy from spectrogram, (ii) phonetic input with
copied timing, (iii) by rule from phonetic input with stress marks.
Notes on phonetic input for these sentences is in SSSHP UK JSRU file.
|1966|| Mattingly, I.G.,
"Synthesis by rule of prosodic features", Language and Speech, 9, 1-13 (1966). S-shaped formant
transitions. Allophone rules. (I,K)|
VOCODER RESEARCH (1958 - 1984)
"One of JSRU's major responsibilities throughout its existence was research into vocoder designs which were suitable for the various government needs for secure communication. The first Head of JSRU, John Swaffield, had been involved with channel vocoders during the 1940's, and during the 1960's most vocoder work was directed at improving the speech quality of channel vocoders while maintaining a moderate digit rate. The main innovation during this period was the use of single-resonant circuits for the channel synthesis filters. These filters had a 3 dB bandwidth much less than the channel spacing (typically 60 Hz and 200 Hz respectively), and so the synthetic speech output had a very jagged spectral envelope. It was discovered that, although this property caused the speech to sound somewhat reverberant, it was subjectively much more acceptable than speech from wider-band steeper-sided filters. Although it had 19 channels, this vocoder kept its transmission rate down to 2400 bit/sec by delta coding across the spectrum using only 2 bits per channel. A prototype design incorporating all the principles was completed in 1966, and since then several different hardware developments based on those principles have been adopted for various UK military and other government uses. A full description of the principles was not published until 1980, 14 years after completion of this research.
For possible future applications requiring lower digit rates, formant vocoders were investigated. Initial experiments concentrated on a hybrid, with fixed channels for F1 and formant tracking for the higher formants (Shearme, 1962). Later work used the complete parallel formant synthesizer and the analysis-by-synthesis formant measurement techniques described by Seeviour et al. and by Dupree.
An attempt was made to build the very computationally-intensive analysis technique of Seeviour et al. into real-time hardware but, before the advent of DSP chips, the complexity was such reliability could not be maintained. This hardware project was therefore abandoned about 1980.
A very successful but simple variable-frame-rate coding technique
was developed to reduce the data rate for the formant vocoder to
around 1000 bit/s with no significant loss of the formant
synthesis quality, and Barry Dupree succeeded in incorporating
this scheme into the analysis, using dynamic programming to
ensure that the frames chosen were optimum for satisfying the
coding accuracy and buffer delay constraints." (JNH 1989)
|1962|| Shearme, J.N.,
"Analysis of the performance of an automatic formant measuring system,"
Proc. Speech Communication Seminar, Stockholm, paper C9. (1962)
|1975|| Holmes, J.N.,
E. McLarnon and M.W. Judd, "Experiments with a variable-frame-rate
coding scheme applied to formant synthesis control signals," SPEECH
COMMUNICATION, G. Fant (Ed.), Almqvist and Wiksell, Stockholm, 1,
|1978|| Holmes, J.N.,
"Parallel formant vocoders," Proc. IEEE EASCON Conference, Washington,
|1980|| Holmes, J.N.,
"The JSRU channel vocoder," Proc. IEE, 127, Pt. F, 53-60. (1980)
|1984|| Dupree, B.C.,
"Formant coding of speech using dynamic programming," Electronics
Letters, 20, 279-280. (1984)|
MACHINE VOICE OUTPUT (1976 - 1985)
"Although the original aim of the formant synthesis by rule had
been as a research tool into the cues for phonetic perception,
interest in practical application for machine voice output
developed in the mid 1970s. The old work was therefore revived
and extended for this purpose. A new prosodic system was
developed taking account of work that had been done elsewhere
since Mattingly's 1964 study, and a complete experimental system
was developed for going from text to speech in real time.
However, time and staff resources during the period up to the
amalgamation with RSRE were not sufficient to make full use of
this system to achieve the potential performance that should have
been possible. The system as it was during 1985 has been made
available to various academic and commercial groups in the U.K.
to take the development further, but other priorities have meant
that very little work on this subject can be continued at RSRE.
Some of the latest thoughts on the way the table-driven phonetic
rule system of Holmes, Mattingly and Shearme could be enhanced to
deal with the rich range of allophonic variation that occurs in
natural speech are given in Holmes (1988)." (JNH 1989)
|1977|| Holmes, J.N.,
R.D. Wright, J.W. Yates and M.W. Judd, "Extension of the JSRU synthesis
by rule system," Proc. 9 Int. Cong. Acoustics, Madrid, paper I108.
|1983|| Holmes J.N., and
A.P. Stephens, "Acoustic correlates of intonation in whispered speech,"
J. Acoust. Soc. Am., 73, S87. (1983)|
|1984|| Stephens, A.P.,
and J.N. Holmes, "Use of flexible voice output techniques for
machine-man communication," Behaviour and Information Technology, 3,
|1988|| Holmes, J.N.,
SPEECH SYNTHESIS AND RECOGNITION, Van Nostrand Reinhold, Wokingham.
J.S. BRIDLE BARRY C. DUPREE JOHN N. HOLMES 1950 B.Sc. in mathematics, Imperial College of Science and Technology, London 1953 M.Sc. in Electrical Engineering, Imperial College of Science and Technology, London. United Kingdom Scientific Civil Service. 1956 Joint Speech Research Unit 1960/61 Guest Research Worker, Royal Institute of Technology, Stockholm, Sweden 1970 Head of JSRU 1981 D.Sc. in Engineering, London University, based on published speech research work 1985 Left JSRU, private speech technology consultant 1999 Deceased M.J. HUNT M.W. JUDD I.G. MATTINGLY (see SSSHP USA Haskins Laboratories page) E. McLARNON DAVID J. QUARMBY 1984 Dept. of Electronic & Electrical Engineering Loughborough Univ. of Technology, Ashby Rd, Loughborough, Leics. LE11 3TU, England 19 Loughborough Sound Images Ltd. The Technology Centre, Epinal Way, Loughborough, Leicestershire, LE11 0QE (Quarmby company making speech synthesizers, DSP equip.) J.M. RYE P.M. SEEVIOUR JOHN N. SHEARME 1938 Post Office Research Station, London 1939 British Army 1946 Returned to Post Office Research Station 1951 B.Sc. in Electrical Engineering, London University 1953 Government Communications Headquarters 1956 Joint Speech Research Unit 1967 Left JSRU to head a speech equipment development group 1978 Retired from U.K. Civil Service A.P. STEPHENS E.M. THORNBER D.H. WALESBY R.D. WRIGHT J.W. YATES
CONTRIBUTIONS AND REVIEW BY:
Dr. J.N. Holmes (former Head of JSRU)