NMAH | Smithsonian Speech Synthesis History Project (ss

HISTORY

"The Joint Speech Research Unit was formed in 1956 by amalgamation of the speech research interests of several U.K government departments, with the main emphasis originally on speech telecommunications for both civil and defense applications. During its 30 year separate existence it was administratively affiliated to a number of different departments, starting with the Post Office (at that time responsible for the national telephone system). During the 1960s the Unit expanded to a total staff of about 40, but extended its responsibilities to include development of secure speech communication equipment for military and other government applications. However, in 1967 the Unit was split into two roughly equal parts, separating this extra responsibility from the research functions. JSRU then once more became entirely a research organization, with about 20 staff. This situation continued until the beginning of 1986, when JSRU was amalgamated with a speech research group at the Royal Signals and Radar Establishment (RSRE, in Ministry of Defence) to form a new Speech Research Unit there. JSRU therefore no longer exists as such. The last Head of JSRU is one of the joint Heads of the new Unit."
(JNH 1989)

[The quoted material, "JNH 1989", was extracted from a communication from J. N. Holmes to H. D. Maxey, May 5, 1989, for this history. See SSSHP UK JSRU file.]

1962	Shearme, J.N., and J.N. Holmes, "An experimental study of the classification of speech sounds according to their distribution in the formant 1-formant 2 plane", Proc. 4th Int. Cong. Phonetic Sciences, Helsinki 1961, Mouton and Co., 234-240. (1962) (I)
	Holmes, J.N., "An investigation of the volume velocity waveform at the larynx during speech by means of an inverse filter," Proc. 4th Int. Cong. Acoust., Copenhagen, paper G13. (1962)
1973	Holmes, J.N., and E.M. Thornber, "Formant frequency measurement by waveform matching during closed-glottis periods," Proc. British Acoust. Soc., 2, paper 73SHB5. (1973)
1976	Holmes, J.N., "Formant excitation before and after glottal closure," Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing, Philadelphia, PA, 39-42. (1976)
	Seeviour, P.M., J.N. Holmes and M.W. Judd, "Automatic generation of control signals for a parallel-formant speech synthesizer," Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing, Philadelphia, PA, 690-693. (1976)
1978	Holmes, J.N., M.W. Judd and D.H. Walesby, "A high-quality all-digital sound spectrograph developed for speech signal analysis," Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing, Tulsa, OK, 43-46. (1978)
	Hunt, M.J., J.S. Bridle and J.N. Holmes, "Interactive digital inverse filtering and its relation to linear prediction methods," Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing, Tulsa, OK, 15-18. (1978)
1980	Dupree, B.C., "The use of tracking rules in automatic formant analysis of speech," J. Acoust. Soc. Am., 68, S71. (1980)

"This project was undertaken by John Holmes, when visiting the Royal Institute of Technology in Stockholm. The aim was to see whether formant synthesis was capable of achieving really natural-sounding speech. One male sentence and one female sentence were attempted, but the male one was much more successful. The results for the male speech were judged at the time by many people to be almost identical to the natural, but on careful listening there were still many faults. Some of these faults could be attributed to difficulties in generating the control signals accurately enough using a conducting-ink analog function generator, but others were almost certainly a result of the synthesizer design." (JNH 1989)
1961	Holmes, J.N., "Research on speech synthesis carried out during a visit to the Royal Institute of Technology, Stockholm, from November, 1960 to March, 1961", Report JU 11-4, Joint Speech Research Unit, British Post Office, Eastcote, Middlesex, England. (I,K)
	Fant, C.G.M., J. Martony, U. Rengman, A. Risberg and J.N. Holmes, "Recent progress in formant synthesis of connected speech," J. Acoust. Soc. Am., 33, 384. (1961) SSSHP 77.3 Tape: "JSRU DEMO SYNTHESIS 1965" (male human: "I enjoy the simple life.") (male syn: "I enjoy the simple life.") (male human: "I enjoy the simple life.") (fem. human: "He knows just what he wants.") (fem. syn: "He knows just what he wants.") 5" reel, good quality, copy of Klatt MIT copy ** use for master ** SSSHP 32.7 Tape: Demo to accompany "Review of Text-to-speech conversion for English," D.H. Klatt, JASA 82.3, Sept. 1987. (male syn and human: "I enjoy the simple life.") (fem syn and human: "He knows just what he wants.") Cassette, Klatt MIT A/D and D/A from SSSHP77.3 SSSHP 83.12 Tape: "Some Reminiscences on Speech Research", F.S. Cooper, demo with IEEE Trans. A&E, AU-21.3, 6/73. (syn and human: "I enjoy the simple life.") Cassette from plastic record, stylus noise
"The significance of the results of this project, compared with later work using a parallel synthesizer, are discussed in:"
1979	Holmes, J.N., "Synthesis of natural-sounding speech using a formant synthesizer," FRONTIERS OF SPEECH COMMUNICATION RESEARCH, B. Lindblom and S. Ohman (Eds.), Academic Press, London, 275-285. (1979)

"As a result of the encouraging results from the work using OVE II, and also because of the limitations that were revealed during that project, work continued as a background task over many years on the use of a parallel-formant synthesizer to improve on the copy synthesis previously obtained with the cascade-connected OVE-II. This work led to a gradual evolution of the design of the synthesizer that is now widely known as the 'JSRU formant synthesizer', which has since been used also for experiments with speech synthesis by rule and formant vocoders. The synthesizer includes facilities for copying the main features of human voiced excitation pulses. When the control signals were iteratively optimized to copy natural spectral features of an utterance as closely as possible, most listeners found it almost impossible to detect any subjective differences between natural and synthetic speech when listening in very favourable conditions. Even when the control signals were derived automatically using the analysis method of Seeviour et al. (1976), the synthetic sentences were mostly acceptable as natural when heard in isolation, but were not as good as the hand-crafted copies. This work led to the conclusion that for practical synthesis the parallel synthesizer was in fact more versatile than the cascade design, and caused no sacrifice of naturalness achievable, in spite of the traditional theoretical advantages widely associated with a cascade connection. Full detail of a software version of this synthesizer is described in an internal JSRU report, which includes a FORTRAN listing of the code. A DSP chip hardware version has been developed by Loughborough University of Technology with assistance from JSRU (Quarmby and Holmes, 1984). No significant work on female speech was done during this project. Since 1986, however, groups elsewhere in the U.K. have achieved high-quality female synthesis using the same synthesizer, and work has also been done to extend the synthesizer bandwidth beyond the original upper limit of 4 KHz." (JNH 1989) The USA National Security Agency (NSA) made an enhanced version of the "JSRU format synthesizer" available to researchers. See SSSHP USA NSA file.
1972	Holmes, J.N., SPEECH SYNTHESIS, M&B Monograph EE/7, Mills and Boon Ltd, London, 68 pages (1972). Spectrograms and time waveforms of close copy of human speech, pp 44-47. SSSHP 77.1 Tape: "JSRU DEMO SYNTHESIS 1965". Human speech is low-pass filtered to 4 kHz. (syn?/human?/syn?: "I enjoy the simple life, as long as there is plenty of comfort.") 5" reel, good quality, copy of Klatt MIT copy ** use for master ** SSSHP 32.8 Tape: Demo to accompany "Review of Text-to-speech conversion for English," D.H. Klatt, JASA 82.3, Sept. 1987. (syn and natural:"I enjoy the simple life, as long as there is plenty of comfort.") Cassette, Klatt MIT A/D and D/A copy of SSSHP77.1
1973	Holmes, J.N., "Influence of glottal waveforms on the naturalness of synthetic speech from a parallel-formant synthesizer", IEEE Trans. Audio and Electro-acoustics, AU-21, 298-305 (1973). Additional details on synthesis.
1980	Holmes, J.N., "Avoiding unwanted low-frequency level variations on the output of a parallel-formant synthesizer," J. Acoust. Soc. Am., 68, S18 (1980).
1981	Holmes, J.N., "Requirements for speech synthesis in the frequency range 3 - 4 kHz," Proc. F.A.S.E. Symposium on Acoustics and Speech, Venice, 1, 169-172 (1981).
1982	Rye, J.M., and J.N. Holmes, "A versatile software parallel-formant speech synthesizer," JSRU Research Report No. 1016, Nov 1982. Tape ?
1983	Holmes, J.N., "Formant synthesizers: cascade or parallel?," Speech Communication, 2, 251-273. (1983) (K)
1984	Quarmby, D.J., and J.N. Holmes, "Implementation of a parallel-formant speech synthesiser using a single-chip programmable signal processor," Proc. IEE, 131, Part F, 563-569. (1984) (K)
1985	Holmes, J.N., "A parallel-formant synthesizer for machine voice output," COMPUTER SPEECH PROCESSING, F. Fallside and W.A. Woods (Eds.), Prentice-Hall, London, 163-187. (1985)

"Publication of the Haskins Laboratories' work using the Pattern Playback to determine the importance various perceptual cues for consonants in the late 1950's led JSRU to appreciate the value of rule synthesis as a research technique. This work was started by John Shearme using an analog hardware parallel formant synthesizer, and a conducting-ink function generator. Using these tools he succeeded in formulating useful rules for British English vowels, stops and fricatives, but the labour of converting the rules by hand onto conducting-ink traces and the difficulty of maintaining the apparatus in calibration caused progress to be very slow. Early in 1963 John Holmes devised a computer program structure that used a set of tables which compactly described all the Shearme rules, and he then extended the tables to include all the phonemes of the British RP dialect. The output of the rule program was a 5-channel punched paper tape, which was then mounted in an endless loop on a specially-developed high-speed tape reader. Each of the nine control signals used at that time was converted to a 32-level analog signal for feeding the synthesizer. This change avoided all the difficulties previously caused by the conducting-ink machine, as well as greatly speeding up rule development. This phase of the work only attempted rules for the phonetic segments, and duration and fundamental frequency data were derived from measurements of natural utterances. From mid 1963 to mid 1964 JSRU received the benefit of having Ignatius Mattingly as a guest worker. He assisted with refinement of the phonetic rule tables, and used the same facilities to create a prosodic rule system, whose input was phonetic text with additional markings for accented syllables and punctuation." (JNH 1989)
1963	Holmes, J.N., and J.N. Shearme, "Speech synthesis by rule controlled by a small low-speed digital computer," J. Acoust. Soc. Am., 35, 1911. (1963)
1964	Holmes, J. N., I. G. Mattingly and J. N. Shearme, "Speech synthesis by rule," Language and Speech, 7, 127-143. (1964) (B,I,K) Demo sentences are of three types: (i) manual copy from spectrogram, (ii) phonetic input with copied timing, (iii) by rule from phonetic input with stress marks. Notes on phonetic input for these sentences is in SSSHP UK JSRU file. SSSHP 77.2 Tape: "JSRU DEMO SYNTHESIS 1965". (syn, i/ii/iii: "A bird in the hand is worth two in the bush.") (syn, ii, sen: "The process of amplitude modulation of a high frequency carrier wave ... telephony channels.") (syn, iii, 7 sen: "It was the last thing I expected to find there. Did you come by motorcar? I'm going home now. Someone, somewhere wants a letter from you. I've called several times and never found you there. Like most old people, he ... listen to his reminiscences.") 5" reel, 7.5 ips, good quality, copy of Klatt MIT copy ** use for master ** SSSHP 83.15 Tape: "Some Reminiscences on Speech Research", F.S. Cooper, demo with IEEE Trans. A&E, AU-21.3, 6/73. ("Someone, somewhere, wants a letter from you.") Cassette from plastic record, stylus noise SSSHP 32.17 Tape: Demo to accompany "Review of Text-to-speech conversion for English," D.H. Klatt, JASA 82.3, Sept. 1987. (5 sen: "A bird in the hand ... letter from you.") Cassette, Klatt MIT A/D and D/A of SSSHP 77.2
1966	Mattingly, I.G., "Synthesis by rule of prosodic features", Language and Speech, 9, 1-13 (1966). S-shaped formant transitions. Allophone rules. (I,K) Tape ? ("A bird in the hand is worth two in the bush") Tape ? (15 utterances, "It isn't exactly what I want")

"One of JSRU's major responsibilities throughout its existence was research into vocoder designs which were suitable for the various government needs for secure communication. The first Head of JSRU, John Swaffield, had been involved with channel vocoders during the 1940's, and during the 1960's most vocoder work was directed at improving the speech quality of channel vocoders while maintaining a moderate digit rate. The main innovation during this period was the use of single-resonant circuits for the channel synthesis filters. These filters had a 3 dB bandwidth much less than the channel spacing (typically 60 Hz and 200 Hz respectively), and so the synthetic speech output had a very jagged spectral envelope. It was discovered that, although this property caused the speech to sound somewhat reverberant, it was subjectively much more acceptable than speech from wider-band steeper-sided filters. Although it had 19 channels, this vocoder kept its transmission rate down to 2400 bit/sec by delta coding across the spectrum using only 2 bits per channel. A prototype design incorporating all the principles was completed in 1966, and since then several different hardware developments based on those principles have been adopted for various UK military and other government uses. A full description of the principles was not published until 1980, 14 years after completion of this research. For possible future applications requiring lower digit rates, formant vocoders were investigated. Initial experiments concentrated on a hybrid, with fixed channels for F1 and formant tracking for the higher formants (Shearme, 1962). Later work used the complete parallel formant synthesizer and the analysis-by-synthesis formant measurement techniques described by Seeviour et al. and by Dupree. An attempt was made to build the very computationally-intensive analysis technique of Seeviour et al. into real-time hardware but, before the advent of DSP chips, the complexity was such reliability could not be maintained. This hardware project was therefore abandoned about 1980. A very successful but simple variable-frame-rate coding technique was developed to reduce the data rate for the formant vocoder to around 1000 bit/s with no significant loss of the formant synthesis quality, and Barry Dupree succeeded in incorporating this scheme into the analysis, using dynamic programming to ensure that the frames chosen were optimum for satisfying the coding accuracy and buffer delay constraints." (JNH 1989)
1962	Shearme, J.N., "Analysis of the performance of an automatic formant measuring system," Proc. Speech Communication Seminar, Stockholm, paper C9. (1962)
1975	Holmes, J.N., E. McLarnon and M.W. Judd, "Experiments with a variable-frame-rate coding scheme applied to formant synthesis control signals," SPEECH COMMUNICATION, G. Fant (Ed.), Almqvist and Wiksell, Stockholm, 1, 71-79. (1975)
1978	Holmes, J.N., "Parallel formant vocoders," Proc. IEEE EASCON Conference, Washington, 713-718. (1978)
1980	Holmes, J.N., "The JSRU channel vocoder," Proc. IEE, 127, Pt. F, 53-60. (1980)
1984	Dupree, B.C., "Formant coding of speech using dynamic programming," Electronics Letters, 20, 279-280. (1984)

"Although the original aim of the formant synthesis by rule had been as a research tool into the cues for phonetic perception, interest in practical application for machine voice output developed in the mid 1970s. The old work was therefore revived and extended for this purpose. A new prosodic system was developed taking account of work that had been done elsewhere since Mattingly's 1964 study, and a complete experimental system was developed for going from text to speech in real time. However, time and staff resources during the period up to the amalgamation with RSRE were not sufficient to make full use of this system to achieve the potential performance that should have been possible. The system as it was during 1985 has been made available to various academic and commercial groups in the U.K. to take the development further, but other priorities have meant that very little work on this subject can be continued at RSRE. Some of the latest thoughts on the way the table-driven phonetic rule system of Holmes, Mattingly and Shearme could be enhanced to deal with the rich range of allophonic variation that occurs in natural speech are given in Holmes (1988)." (JNH 1989)
1977	Holmes, J.N., R.D. Wright, J.W. Yates and M.W. Judd, "Extension of the JSRU synthesis by rule system," Proc. 9 Int. Cong. Acoustics, Madrid, paper I108. (1977)
1983	Holmes J.N., and A.P. Stephens, "Acoustic correlates of intonation in whispered speech," J. Acoust. Soc. Am., 73, S87. (1983)
1984	Stephens, A.P., and J.N. Holmes, "Use of flexible voice output techniques for machine-man communication," Behaviour and Information Technology, 3, 153-161. (1984)
1988	Holmes, J.N., SPEECH SYNTHESIS AND RECOGNITION, Van Nostrand Reinhold, Wokingham. (1988)

Top

BIOGRAPHIES


J.S. BRIDLE


BARRY C. DUPREE


JOHN N. HOLMES

1950 B.Sc. in mathematics, Imperial College of Science and
     Technology, London
1953 M.Sc. in Electrical Engineering, Imperial College of Science
     and Technology, London. United Kingdom Scientific Civil
     Service.
1956 Joint Speech Research Unit
1960/61 Guest Research Worker, Royal Institute of Technology,
     Stockholm, Sweden
1970 Head of JSRU
1981 D.Sc. in Engineering, London University, based on published
     speech research work
1985 Left JSRU, private speech technology consultant
1999 Deceased
     

M.J. HUNT


M.W. JUDD


I.G. MATTINGLY  (see SSSHP USA Haskins Laboratories page)


E. McLARNON


DAVID J. QUARMBY

1984  Dept. of Electronic & Electrical Engineering
      Loughborough Univ. of Technology, Ashby Rd,
      Loughborough, Leics. LE11 3TU, England
19    Loughborough Sound Images Ltd.
      The Technology Centre, Epinal Way, Loughborough,
      Leicestershire, LE11 0QE
      (Quarmby company making speech synthesizers, DSP equip.)


J.M. RYE


P.M. SEEVIOUR


JOHN N. SHEARME

1938 Post Office Research Station, London
1939 British Army
1946 Returned to Post Office Research Station
1951 B.Sc. in Electrical Engineering, London University
1953 Government Communications Headquarters
1956 Joint Speech Research Unit
1967 Left JSRU to head a speech equipment development group
1978 Retired from U.K. Civil Service


A.P. STEPHENS


E.M. THORNBER


D.H. WALESBY


R.D. WRIGHT


J.W. YATES

	SSSHP Contents \| Labs \| Abbr. \| Index

Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use

JOINT SPEECH RESEARCH UNIT (JSRU)

CONTENTS:

HISTORY

SPEECH ANALYSIS RELEVANT TO SYNTHESIS (1960 - 1984)

COPY-SYNTHESIS OF NATURAL UTTERANCES USING THE OVE-II CASCADE FORMANT SYNTHESIZER (1960 - 1961)

COPY-SYNTHESIS OF NATURAL UTTERANCES USING A PARALLEL FORMANT SYNTHESIZER (1962 - 1983)

FORMANT SYNTHESIS BY RULE (1962 - 1966)

VOCODER RESEARCH (1958 - 1984)

MACHINE VOICE OUTPUT (1976 - 1985)

BIOGRAPHIES

CONTRIBUTIONS AND REVIEW BY: