NMAH | Smithsonian Speech Synthesis History Project (ss

ADRIAN FOURCIN INTERVIEW

This is an online version of "An interview with Adrian Fourcin, Professor of Experimental Phonetics at University College London." The interview has been made available for educational, non-commercial purposes by the Board of Trustees of the Science Museum, London, and Professor Fourcin. The interview was recorded by Roger Bridgman at Blythe House, the West London annexe of the Science Museum, on December 10, 1991, and transcribed by him January 20, 1992. (See Transcription, SSSHP 105, in University College London file.)

In this interview, Professor Fourcin reviews his education and lifelong study of speech communication. Included are his early work on electronic speech transmission with Dennis Gabor and Colin Cherry at Imperial College, and speech synthesis with Walter Lawrence at Signals Research and Development Establishment (SRDE). While at SRDE, Fourcin devised a multiple function generator that used conducting ink on a plastic sheet to provide control voltages for their speech synthesizer. One of the models went to the Phonetics Department, University of Edinburgh, where Lawrence later moved his work. The function generator technology was later adapted by other laboratories.

After a six-month visit with Frank Cooper at Haskins Laboratories, working on pitch perception, Fourcin finished his PhD thesis and moved to University College London, under Dennis Fry, where he developed techniques for treating people with speech and hearing disorders and was instrumental in introducing the study of acoustics of speech production and perception to the deafness and speech pathology curricula.

Note: This interview can be searched electronically by using the browser's "Find" function.

INTERVIEW

Bridgman: Well, Professor Fourcin, could you take us back first of all to your graduation and, as a young man, what were your plans on graduation?

Fourcin: Well, I suppose graduation is the culmination of a lot of previous thought as an adolescent, and when I was 13 or so, like many other young boys, I was very much interested in the scientific basis of the world around me. I came to the conclusion from my own reading and study, unaided, that probably the most important phenomenon was that associated with electrical events, and I was reinforced in this view when I discovered that the phenomena, not only basic to the then theories of the structure of matter, but also neural conduction in the human body, were all based on electrical effects.

So I thought, well, electricity is something that I should really work on, and I went straight from school to the Northampton Polytechnic in London, which enabled me to work for a degree in electrical engineering with a minimum of delay. I took inter BSc in one year instead of staying on at school for two years, and I ended up with a degree in 1947. I was still taking the exams when I was 19. So I finished my degree work quite early, and the thing I discovered then was that electrical engineering and electronics were not exactly the answer to an understanding of the structure of the world, but certainly gave very powerful tools, and it seemed important to me to apply these tools to the human condition if possible. I decided that I would spend a year or so working and then go back and work for a PhD.

One's plans never do work out quite exactly, and I spent a year working at the National Physical Laboratory in the Engineering Division, working on the mechanical structures that they had, aircraft wings, bridges and so on, fatigue machines, devising methods of measuring strain in engineering structures, as I'd taken strength of materials in my final degree as one of the subjects. This association was one first step to applying electronics to something at least more broad than just electronics itself. I found the National Physical laboratory a marvellous place to be, in Teddington, so many different things going on, and it was with great reluctance I decided that I would leave to go and work in industry. I went to the GEC in Coventry.

And then after a year, or during my first year, I was working towards going back to university to work for a PhD. I found there was a course at Imperial College that Willis Jackson had set up which was intended to bridge the divide between what one could do at the level of an undergraduate and what was needed for postgraduate research. I applied to go on this course and I was accepted, but Willis Jackson wrote to me and asked me if I would like to consider not just being a student but a research assistant, because I'd had this experience in the National Physical Laboratory and industry, to work with someone called Dennis Gabor on speech. I was immediately interested, because this seemed to me to be much nearer the sorts of thing that I'd been thinking might be of real interest, combining electronics and work more narrowly concerned with helping things and understanding more about the human condition, as I said before. So I went and worked with Gabor.

Bridgman: Can you say what it was, is it possible to say what it was gave you this interest in the human condition? Was that something in your family background, medicine or anything like that?

Fourcin: No, it wasn't. I think my own world view has rather been coloured because I was different from many of the other boys at school, in the sense that I had a non-English surname. The name Fourcin is very un-English. I had an English mother, so I spoke English and was brought up speaking English and had many English points of view, but my father was French, and going backwards and forwards between England and France every summer to see French relatives, and not knowing it at the time but eating French food and so on, I think took me out of what might have been a more narrowly-directed set of interests. And I think the war, with its moving of people away from their homes, introduced me to a lot of things that I would never otherwise have thought very much about. So there's not an easy answer to your question, and why I was interested so much in nervous conduction, for example, when I was 13, I really can't tell you.

Bridgman: Yes, always hard to say, isn't it. Anyway, getting back to Gabor, you started work with him.

Fourcin: Well, Gabor was really a marvellously prolific inventor, with an extraordinarily fertile mind, and he wanted - he was trained as an engineer in the first instance, in Hungary - he wanted to do things which combined fundamental advance intellectually with practical application. At that time the new breed of transatlantic cables was being laid to link North America with Europe. The British Post Office was extremely active in this area, working on new methods of constructing and utilising cables. We had the problem, or they had the problem, of being able to have a sufficiently good commercial return from the fairly limited bandwidth provided by the cable, and so there was a lot of interest in the possibility of reducing the bandwidth requirements for speech transmission.

Gabor had been working on a scheme which sought to make use of intrinsic redundancy in the communication between one speaker and another which arises from what he felt were imperfections in respect of the ear's ability to process speech. In effect really it was a scheme which depended more on regularities in speech than on imperfections in hearing, and involved sort of Döppler scanning of recorded speech material so that you would coalesce adjoining segments in a uniform way, and in smoothing and coalescing reduce the requirement for a total transmission of all of the complexities of the signal. The fundamental problem was one of being able to do this in real time in an adequate fashion, one which can very easily be solved now using computational techniques, but at that time it wasn't easy to see how to do it.

Gabor had devised in the first instance a demonstration involving the scanning of sound tracks on film, and then he wanted to be able to do this using magnetic recording techniques, which were also fairly new. The scanning was to be dynamic, with the scanning windows continually varying in their physical separation as a function of the fundamental frequency of the speech. I wanted to persuade him to use a sort of photo-mosaic method of doing it, but in the end I wasn't able to convince him and I ended up by inventing a new magnetic material which had permeability variations recorded rather than intensity. That did work, but it was not at all a very efficient way of doing things. We were working with contract money coming from the Ministry of Supply via SRDE, and Walter Lawrence was the contract supervisor, because the army were also interested in methods of bandwidth compression, not for reasons of economy so much as for reasons of encryption security of message transmission.

Bridgman: So although some of this work was destined for the Post Office, all of it was funded by SRDE, was it?

Fourcin: All of it was funded by SRDE. Gabor got the original idea, as it were, because of the practical need in respect of the transatlantic cable, but the actual work there was funded in regard to defence requirements. So I was research assistant funded totally with money from SRDE, and Walter Lawrence came along, then in his lateish 40s I suppose - because I started in 1949 on this, I graduated in '47 and spent two years - came along to the contract meetings. Now we did make reasonable progress, but I think the technical problems were far too great to be solved in a reasonable fashion.

I then transferred to working for a PhD full time. I had a scholarship from a Major County Award that I'd never really taken up completely, because I'd finished my degree so young, and this was transferred to postgraduate work. I worked with Colin Cherry as my supervisor on, again, methods of - what had happened was that people now were much more interested in speech and speech processing, and the phenomenon that infinitely-clipped speech was intelligible was exciting a lot of scientific interest, how was it, why was it, and so on. And of course it was also a simple signal that in principle lent itself to being encoded perhaps in a more efficient way. So I worked on trying to find out why, how it was that clipped speech was so intelligible, and worked on spectrographic analysis of that, and discovered that the formant structure was just the same, and that you could in fact - as part of my work I did simple syntheses of speech, I went down now to SRDE and made syntheses of different vowels and found out within -

Bridgman: So they had synthesizers in operation at that time, did they?

Fourcin: Well, what had happened was that Walter Lawrence, being in a place where they were concerned with efficient methods of transmitting speech, had been wondering about this, and he had come across the book by Potter, Kopp and Green on the spectrographic analysis of speech which appeared in 1947, and noticed that the formants in those methods of presenting the spectrographic analyses were very clearly defined, there were few of them and they moved slowly. And so it immediately occurred to him that it might be possible to encode speech utilising formant information which would secure a very substantial reduction in necessary channel capacity. I was in contact with this work and felt that it was really fundamental to further advance.

Bridgman: What did you actually have available to do your spectrographic analysis at that time at Imperial College?

Fourcin: Well, I ended up in fact making my own spectrograph. Everything is rather a tangled tale, but the Americans - Steinberg had, I think it was in 1937, in the Bell System Technical Journal, suggested the concept of a spectrograph, and in a way this was a logical extension of the work that Paget had been doing, which of course in the first instance stemmed from the work of Helmholtz and Ohm in the last century. Steinberg proposed that there should be a frequency-time-intensity presentation, and this was done just before the war. It was used for purposes of decoding scrambled speech, as a very valuable tool in that respect, and the Americans in consequence had produced a spectrograph, a whole range of them, which were very effective tools.

The people in SRDE wanted to, of course, make use of spectrographs for their work, in following these what were quite new ideas. Lawrence was a pioneer in this respect, no other work in the world was directed along the lines that he had initiated, and they needed to be able to get to grips with real spectrographic presentations and use them for synthetic work in the analysis of the speech that they were trying to experiment with. But the Treasury said, well, we're very short of dollars, so why can't you make your own? And they said, well, the Americans have already done it, we just can buy one and it will - we can get on with our work. And the Treasury said, well, could you make one? Of course they had at Christchurch one of the best workshops in the whole country, and a whole drawing office, and so they said well, we could make one, and the Treasury said, well, make one, we won't give you the dollars. So that in fact involved a lot of work, and they decided they'd put it out to contract. The contract was not very successful, and I inherited, because I was on good terms with the people there, one of the unsuccessful products of a firm called Winston Electronics out at Richmond. I looked at it and found out what was wrong with it, and I completely re-did the mechanics and I re-did the magnetic recording, because I'd been working on magnetic recording, and produced my own spectrograph. But that was just an aside.

So I used that in fact for the studies in my thesis, where I was looking at the importance of relative levels of formants, which I synthesised using the early synthesizers they were working on down at SRDE, and found out what were the, as it were, capture effects in regard to whether or not clipped speech or clipped synthetic speech, now properly controlled, would be intelligible, in effect. I actually worked on the application of Shannon information theory to clipped speech to determine what was the ideal probabilistic structure for such a signal, and how clipped speech differed from it, and how you could encode it with economy. It turned out in fact that simple clipped speech has got a first-order probability distribution which is almost perfect, which was rather disappointing to me. I found the second-order distribution was more open to redundancy encoding, but not awfully profitably. It took me about seven hours to analyze a 20-second chunk of speech using my electronic equipment, a thing that could be done now in less time than it takes to say it.

Bridgman: What was Cherry like as a supervisor?

Fourcin: Well, Cherry was a very volatile man, and was not at all of the same stamp as Gabor. Both of them were very difficult individuals. Gabor was so intellectually powerful that he held everybody else in low esteem, by and large, and I found that I had to spend half an hour thinking about the simplest question before I went to see him, because he'd be on to it immediately. He had some - I got on very well with Gabor only because, I think, I'd prepared so carefully. Cherry didn't have that powerful intellect but he had an extremely inventive mind and he would dance in conversation from one topic to another with great volatility and ingenuity on occasion. He was exasperating to argue with, because he would - rather like boxing with a shadow - he would suddenly shoot off at a tangent somewhere. But Cherry was extremely interested, not in the application of the physical sciences, as Gabor was, to the problems of communication, but in the greater understanding and exploration of perceptual phenomena. He discovered, as it were, psychology, and tried to apply it to communication engineering. That was one of the reasons he was interested in clipped speech. So it was not easy, I think, for any of us as students to get on either with Gabor, or with Cherry particularly, for these different reasons.

Bridgman: So anyway, you went through the PhD and did that, and what happened after that?

Fourcin: Well, because of my - I had the idea that when I finished my PhD work I'd, if I went into industry, settle down and I might get married, and I might have a family and then I would be unable to, as it were, see anything else of the world, one would get tied down by one's responsibilities, so I looked around for some post-PhD scholarship which would enable me to go to America or somewhere or in Europe. I found that there was a University of London post-doctoral exchange scheme with the CNRS, the French national scientific organisation, which reciprocally supported French and British students at post-doctoral level. So I competed for that, and this I - well, you see, I was an engineer so far as the people interviewing me were concerned, and when they started asking me questions in French, because I had a French background, I was able to speak much better French than the British graduates in French, which astonished them. And I think, because I was working on speech and had got all sorts of ideas - when one's young, especially, there is a ferment of activity in one's brain - and they were asking me about the things I was doing, and then they spoke to me in French, so I think, fortuitously, I made a very good impression, and I got this rather difficult-to-obtain scholarship. And so I went to France, and discovered then that -

Bridgman: What year was that, that you went to France?

Fourcin: I went to France in 1953, and went to the French telecommunications research laboratories at Issy-les-Moulineaux near to Paris. I discovered there they didn't quite know what to do with me, but then I discovered that there were two people called Marcou and Daguet who were also working on a new method of compressing speech, and so I soon was eating with them every day at their table in this marvellous French canteen, the like of which I'd never seen before, being brought up with British standards. They had the idea - they in fact had been following Gabor's theory that he had applied and developed in regard to his method of compressing speech. Gabor had represented any signal in terms of what he called an analytic signal approach, in which you could talk about, as it were, the intensity and phase spectra separately.

Marcou, who was a colonel in the French army - but both Marcou and Daguet were products of the École Polytechnique in France, so they were mathematically relatively sophisticated - Marcou had had the idea that if you operated on the phase component of the analytic signal and treated it as a frequency modulation signal, you could in fact reduce the modulation index, and in that way reduce the bandwidth required for the adequate transmission of speech signals. They demonstrated a system which involved just that sort of process, and of course because I'd been working on speech I told them about formants and so on, which they didn't know about, being mathematicians and engineers rather. I suggested a way of mathematically operating on their signal, in fact simply taking out the phase signal itself and operating directly on that, extracting it as a signal in its own right, the phase as a function of time signal, instead of doing their sort of frequency modulation approach in which they tried to reduce the index.

It proved to be the case that in fact their technique was capable of totally transforming the nature of the spectrum but it did not, in any way that was of practical value, reduce the essential bandwidth, and that when you took out this phase signal and looked at its spectrum it had an enormous peak in the low frequency, but if you filtered the phase signal, that I suggested they work on, then you immediately substantially destroyed the intelligibility of any subsequently reconstituted signal.

Whilst this was going on I was trying to finish off my thesis, and I essentially did this, but my period came to an end and I wanted to go back and carry on working in speech. I'd still kept in touch with Walter Lawrence and Ralph Eades who was a close and staunch colleague of his over the years when they were together at SRDE, and Walter Lawrence said that if I wanted to work at SRDE I could, but probably the best way would be to go in for an open competition to the Scientific Civil Service at the Senior Scientific Officer level. So I went in for this SSO competition. Of course I'd been in France and I was working and interested in absolutely everything, and I think there's a certain glamour about speech work. For whatever reason, anyway, I got through the competition, I think actually now, looking back, rather to Walter Lawrence's surprise. He was a lone worker and didn't really want anybody to come and work with him, and so he was a little bit sort of bemused when I got through and then turned up.

Bridgman: And when was that? When did you actually arrive on the doorstep?

Fourcin: 1955. November 16th 1955 I arrived.

Bridgman: Where was SRDE at that time?

Fourcin: Well, SRDE was split between two sites: the main site was just near Christchurch and then they had another subsidiary site, Steamer Point. It all sprang from early work on radar. They had radar stations along the south coast, of course, during the war, and they had supporting laboratories to develop the new experimental techniques to ensure that the latest improvements were being properly operationally assessed and available. Steamer Point was one of these. The place just a quarter of a mile inland from the sea, where Steamer Point was, was where the main site for SRDE was. We were all in wooden huts, the scientists. The administrators were all in nice brick buildings with proper central heating and so on, but that seems to be part of the British tradition that the workers who get their hands dirty are a rather inferior race compared with those who deal with the administration of land or whatever - accounts nowadays.

The whole of SRDE was split into different divisions, and this work on speech was in the Lines Division. That was run by a man called Frank Rule, and Frank Rule had been in for a competition in order to become Superintendent at SPSO level of the Lines Division. He had decided that the work that Walter Lawrence was doing on synthetic speech would be something that he would like to push, because he felt that it would solve important problems in regard to security and economy of speech transmission right down to the level finally, perhaps, of the battlefield. So there was quite a concentration of effort which was being put in to developing speech, and so I adventitiously arrived at a time when internally they had decided to put a lot of extra effort. So here was this young man coming who'd been working abroad and had been in contact and so on right from my student days. It was a sort of ideal situation and I fitted in reasonably well.

There was a man, Leslie Stead, who was charged with the setting up of a group to provide a practical implementation of a complete analysis-synthesis system involving what the Americans subsequently called formant vocoder approaches. At that time SRDE was really ahead of the rest of the world in the whole of this technique. But the problems of formant analysis using those available techniques were impossible to solve adequately, and although a lot of ingenious activity was directed towards the solution of the problems, and they did get things going to a degree: their most spectacular piece of work was to, in real time, analyze speech, transmit it to the moon encoded, and get the reflection back and resynthesize from the reflected parameters that came back. That's old Leslie Stead's work.

Leslie Stead - of course there are always movements in human affairs, and the security people at Ruislip working for GCHQ had also got a speech group under Swaffield. Swaffield and Halsey had been working on the development of the vocoder with special reference to its possible application in regard to - that was about 1946 they published their vocoder paper, I think - with special reference to, of course, the Post Office's need for economic transmission across the Channel in the cable. Swaffield then was the head of the group at Ruislip. He'd moved out of Dollis Hill. There was a sort of internal battle going on between the people who thought that the work at SRDE on speech was important and those who thought that the work at Ruislip was important. Well, in brief, Ruislip won, and the group at SRDE was in fact wound down completely in 1961. I had found Walter Lawrence an almost impossible person to work with, as many others had, although I hadn't known that before I went there. I was offered a job by Dennis Fry at University College in 1960 and I went as a lecturer to University College in 1960, leaving SRDE.

But in fact the work on the 'mangle' which I had got going in 1957 - my work at SRDE was of course concerned with speech synthesis, and I worked on the development of the voiced excitation, the voiceless excitation, solving various problems in regard to dynamic range and developing quite new methods of providing those inputs to the synthesizer. The mangle, as I called it, the multiple function generator, was one of the other things that I, after discussion with Walter Lawrence, thought of as a way of solving the problem of controlling the synthesizer. Walter had got the system whereby he had a cathode ray tube method of generating the control parameters, and that was very inflexible. You had to have masks that were optically scanned, so you had to cut out the masks.

We came across in a book by Korn and Korn a method of generating a function, using a strip of wire which was wound around a drum and pressed onto a strip potentiometer. Walter was showing me this one morning - every morning we talked about something, anything from one-handed typewriters to methods of teaching the deaf - we were going over this one morning and it suddenly occurred to me that if I used conducting ink on a plastic sheet, and had in fact the sheet moving under the roller, I could, if I put pickup points on the other side of the sheet, have an indefinitely large sheet with essentially arbitrarily complex functions, providing they were single-valued, and that this would be a marvellous way of doing like a lot of things, you can say it in five minutes and it took a couple of years to get to the point in '57 -

Yes, the multiple function generator. 'Multiple function potential dividing generator' the workshop wanted me to call it, because when they asked me what I called it, and I said I called it a mangle, they were very disappointed, because they wanted to have a really high-sounding name for this thing that they were working on. I'd made them do so many special things to produce it that they wanted to have a name to correspond.

Bridgman: But it found its way across the Atlantic, anyway, and there was this demonstration with the - when PAT actually talked on stage -

Fourcin: Actually, when Walter Lawrence did that, he used his cathode ray tube -

Bridgman: Oh, that was before -

Fourcin: Yes, that was before the mangle. The mangle did find its way around the world one way and another: the Swedes came and saw what I was doing, because we always had visitors at SRDE. They copied the basic idea, but they went back and had a flat fixed bed with the plastic sheet with the conducting lines on it, and they moved their roller over the bed. The difficulty with that was that you were constrained in regard to the length of utterance you could synthesise, but the advantage was that you had a good flat bed for the functions to, as it were, be pressed out from.

The work at SRDE then stopped in 1961 so far as the main support was at issue, but Walter Lawrence carried on, just by himself, as long as he could, but by the time he was 62 - that would be in 1965, '64, '65 - he had to retire. He'd already been working - normally he would have lost his SPSO grade at the age of 60 and then he could go on at two steps down, I think it was, for another couple of years before he had to retire definitely - but he had come across Abercrombie, the Professor of Phonetics in Edinburgh, and Abercrombie had been relatively welcoming to Walter Lawrence because he wanted to encourage work in his laboratory. Walter would really have liked to have had the work going on at University College with Dennis Fry but Dennis Fry couldn't stand Walter Lawrence. Walter Lawrence was rather a domineering person and was always sure that his ideas were right, even though they might well have been wrong. I will say that Walter Lawrence was a very prolific inventor and had lots of ideas right outside the field, and he was really responsible for quite a lot of the experimental work that Broadbent and Ladafoged did in regard to the original, as it were, scientific insights. I'm not too sure that he got quite the credit he deserved, but at all events Abercrombie encouraged the placing of a contract by SRDE with the University in Edinburgh and Dennis Fry discouraged any attempts that Walter Lawrence would make to, or made to, have anything started in London.

So in 1957, when I got the first mangle going, the idea was that one of them would go up to Edinburgh. In fact I went - Frank Cooper from the Haskins Laboratories had come over for - I'd first made his acquaintance in 1955 at Christchurch, at a meeting that Walter Lawrence and Stead had organised, and he'd invited me to go and work in Haskins. When I went over, just before I went over, the mangle I'd been working on, Walter Lawrence shipped it up to Edinburgh, and that was the leading image in the 'Eye on Research' programme that you referred to in our previous conversation. The plastic sheet with the electrically conducting lines and so on was going around as the first thing that people saw, with the synthetic voice saying 'Eye on Research', or something that corresponded to that, and unfortunately there was no mention in the whole of that programme of all of the enormous amount of work that had been done at SRDE that made all of this possible. It seemed as though it was totally the creation of Edinburgh University which caused an awful lot of bad feeling in SRDE. The workshop were very upset, and all the scientists and so on. I was by that time in Haskins and I didn't really get totally the repercussions of all this, but you'll notice that the mangles that you've got here on your shelves and the table there have got SIGNALS RESEARCH AND DEVELOPMENT ESTABLISHMENT engraved right across the top. That's a classic case of shutting the stable door after the horse has bolted, I'm afraid, because the first mangle had a plain aluminium cover, it wasn't stove enamelled or anything like that, and it didn't have anything on it at all to indicate where it came from or what people had created it.

Bridgman: Anyway, so you had a period at Haskins Laboratories before UCL.

Fourcin: Yes, that's right. I went there with - now one of the things that of course had been a preoccupation when I was a student was the analysis of voice fundamental frequency from the speech signal, in order to get these magnetic scanning windows appropriately spaced, and this has been a perennial problem in regard to speech analysis. Indeed all the perceptual attributes of speech are not very easy to define analytically with any exactitude. We still can't do it awfully well, and we certainly couldn't do it awfully well at that time. John Holmes was fellow student with me in Imperial College. I was working for a PhD and he was working for an MSc. The thing that John was working on involved the production of a correlation analysis.

Norbert Wiener had introduced the importance of this and Gabor had got an idea for a magnetic system for producing correlation functions of signals recorded on tape. Typical of many of Gabor's ideas, the fundamental theory was fine but the method of application was hopelessly impractical. I thought of a way of doing this, so I got involved in correlation, and when I was at SRDE, because I'd been working on clipped speech, I thought of a way of producing a cross-correlation pitch detector which had along a shift register transistor delay line the clipped speech signal being propagated and you used modulo-2 addition - which I learnt from the people in Steamer Point from their encryption work - I used that as a multiplier, and so you could multiply an analogue signal just by having the two-state signal switching gates on and off, and that produced a cross-correlation between the infinitely-clipped speech propagated along the shift register chain and the analogue speech signal. I produced a scanning method for doing that too - this was using transistors in the days when they were first coming out. This was very nice and promising, and so I thought I'd take that with me to America, because I didn't know what I'd be working on.

So I arrived at Haskins, and Frank Cooper of course was famous for his work on PB2, the second version of his original optical playback, in which you scanned a spectrum on a transparent plastic sheet with an array of harmonically-related sine wave modulated light beams. The Haskins people had done absolutely superb pioneering work. It was interesting that there was this coincidence, that on one side of the Atlantic Frank Cooper had been working on the playback, with the idea of synthesising the sound from a plastic strip on which you had simple formant patterns painted in white, or as transparencies on a black background in the first instance, and Walter Lawrence's work with the formant synthesis. Of course Walter Lawrence's approach was in a way more fundamental, and it's odd in a sense that Walter Lawrence was working with American inspiration from the original Potter, Kopp and Green spectrograms and in effect Frank Cooper was working along the European tradition of harmonic synthesis. The tools however led to considerable developments in our understanding of what was important, and particularly the Haskins work led the world in regard to defining essential contrastive pertinent physical characteristics of speech that made speech intelligible. So when I arrived, Frank Cooper was very keen I should do something with PB2, and I was for the first time almost free. Suddenly, for six months I could do, it seemed, almost anything if I could persuade Frank Cooper to let me do it. So in Haskins I thought that I would work on what was the neural basis of pitch perception and I had come across -

Bridgman: Was anybody else working on that at the time?

Fourcin: Yes, I'd come across a paper by Huggins, and Licklider had got a theory, that was thing that really triggered it off, working. Licklider was following Jeffress, who'd published a paper in '47 or '48 on the way that the auditory neural system might be able to produce a sensation of pitch on the basis of a cross-correlation of the signals from the two ears and an auto-correlation of the signals in one ear. These were all ideas I'd been exposed to and working with and so on, and I wondered if there was some way of exploring the ear's ability to do this. There was a paper by Huggins that Licklider had cited as being excellent proof of the existence of this type of neural interaction and cross-correlation, in which you had two white noise signals applied to the two ears, one of them with a phase spectrum which was quite different from the other, but otherwise the amplitude spectra were the same. It was an identical noise source simply phase-changed with an all-pass network with a phase discontinuity in the middle of the spectrum, and what Huggins showed was that if you listened with two ears you heard a little pitch in this and if you listened with one ear you only heard noise, just ordinary white noise.

That could be explained, and I just did a cross-correlation function and saw how you got a cross-correlation function from the signals between the two ears which had humps and bumps corresponding in fact to the periodicity of the pitch that you could see, you could hear, you could see in the cross-correlogram. Well, I thought well if I were to synthesise this cross-correlation function then I could explore this phenomenon, and then I discovered that in any case if you just added Huggins' two signals together you got a signal which had an enormous great bump in its spectrum, and bone conduction across the head could explain his thing. So I in Haskins set up, using the delay lines that I'd been working on, to actually analyze pitch in speech, to produce the basis for experiments in hearing, and to my delight in fact I found that I could induce a sensation of pitch on the basis of these synthesised correlation functions and that the mere addition of noise from a totally independent source to the two ears, if you had noise delayed in one ear and noise going straight to the other ear, would produce a sensation of pitch. Well, I won't bore you with the ins and outs of this, but that's what I worked on in Haskins. Of course, then when I got back to England I had to get back and do what they told me to do at SRDE, and they didn't want me to work on perception. They didn't want me to work on, really, synthetic speech either, and that rather cheesed me off, so I finished my thesis. Lawrence was, as I said, a lone worker, he really didn't like people working with him. He would always be finding some difficulty, and so I was more or less impelled to go to University College when they asked me if I'd like to go and work there.

Bridgman: So what were the dates at Haskins?

Fourcin: Haskins, I was there for six months in 1958. I went out in about April, came back towards the end of the year and immediately started working writing up my thesis, because I'd had a taste of liberty, so to speak. Then in 1959 I finished my thesis, but then it wasn't until towards the end of 1960 that I actually went to University College. And then I replaced, eventually, Peter Denes who went to work in the Bell Telephone Laboratories - that's why Fry was going to have this vacancy.

Bridgman: And so you've been at UCL ever since -

Fourcin: Yes I have, yes -

Bridgman: Has your - I've got a rough idea of what your work is there now, it didn't start in that area I imagine -

Fourcin: Well -

Bridgman: Hearing, and the analysis of speech into its significant components -

Fourcin: I carried on a little bit working on the pitch stuff and in fact extended those experiments, when I was free again, conceptually quite substantially, although I never really published the way that I should have done, as I now realise. But there I had to give courses and I had to deal with people who were in linguistics and phonetics. I still had this idea that it would be useful to do something which - and that was one reason for going there - which, as it were, tied in more with work with human beings. This idea of transmitting speech and understanding more about speech was something which I found fascinating, and understanding about the hearing mechanism and how it interrelated with speech.

We had, thanks to Dennis Fry's work which started before the war in this connection, good relationships with people working in deafness and in speech pathology, and over the years the thing that, well, I have more and more as far as I was able encouraged, was the growth of links in that area. We've definitely oriented our work so that we had a theoretical component which could be tested, insofar as it was capable of being used to train people to speak more effectively who have a speech disorder and hear more effectively if they have some degree of deafness. In regard to pitch extraction, well of course I soon found that that was something that as a tool would be of immense value in these different areas, in foreign language teaching and working with the deaf and of course in speech processing.

I came across some French work, a man called Fabre who'd developed a technique for looking at vocal fold vibration, and so I decided that I would cheat and get the signal basic to the perception of voice pitch straight from the speaker's larynx and that would at least serve as a reference but it would also immediately resolve a lot of problems experimentally that were otherwise obstacles to doing work in the perception of voice pitch. So I spent some time working on the development of that Laryngograph, as I called it, because it turned out Fabre had made a mistake when he thought his signal was biggest because the vocal folds were apart. In fact it was biggest because the vocal folds were in contact, and it seemed wrong to me to call it an electroglottograph when in fact it didn't really tell you anything about the glottis, which is the open space, but it told you a lot about the contact. I just called it a Laryngograph because it was giving a graphical indication of larynx activity. And this was tied in at the time with the speculations that I'd been having, as a result of my interest in hearing, on how it was that people could hear phonetically, as being phonetically identical, sounds that were physically totally different.

So I worked on normalisation, how people tune in to different speech sounds, using synthetic speech. So at that time I was working on two things: the Laryngograph and the experiments in normalisation. I introduced the word 'normalisation' in regard to receptive processing, thinking in terms of what I'd done at school, when you were having normal solutions of acids and alkalis. You had a reference solution, and I had the notion that one had in one's head a sort of reference speech sound, and if someone came along with a different vocal tract or a different accent you would in fact normalise their utterances with reference to your set of internal references. So that's where the word normalisation, so far as I was concerned, came from, and that led to, for me, getting an insight into the whole process of speech skill acquisition. It certainly guided my thoughts in regard to the work that we'd been doing in respect of providing hearing aids which are designed to extract from a noisy and complex acoustic signal only those features, those elements of speech, which are most important for the deaf listener and lip reader to have available clearly presented.

Bridgman: You had a bit of trouble with this concept, I believe you said, with some of your workers.

Fourcin: Yes, the whole idea of extracting from the larynx some aspect of excitation or dealing with, for example, synthetic speech to assess hearing using speech pattern audiometry, as I called it at that time, was and probably still is rather offensive to people who consider that speech is a God-given unity which is holistic and on no account should be distorted. Of course people working in acoustics know that you just go from one room to another and you're going to distort the speech, and the telephone is giving you a garbled version, and it's the stuff of speech transmission that the signal is often rather mutilated. But of course I'd been working on clipped speech and all these other things. The person who's brought up in the arts has a quite different world view, and the speech therapists and the teachers of the deaf at that time certainly had a very different world view.

So it appeared to me that - by that time I'd been working with the Laryngograph with Evelyn Abberton and we thought that finally the way to overcome this difficulty was to introduce these notions in the early stages of an undergraduate course. So we encouraged the introduction at University College - at the crucial time we two shouldered a very big load in making sure that the whole thing was accepted and ran through the college administrative machinery with the minimum of difficulty, so that we were instrumental in ensuring that this first degree course in speech sciences started. We have in fact now for more than 10 years had this running, it's the biggest school of its kind and with an extremely rigorous training for the students in regard to phonetics of course, but also the acoustics of speech perception and production, with links built in between phonological development and the acoustic representation of speech, so that these two things are embedded in the course.

The students don't know that first, their course is rather more difficult than many other courses, in fact any other course in this respect in the country, and that it has got these revolutionary concepts in it, it's just straight physics of speech and psychoacoustics of speech perception that's in it, and we hoped and still hope that this will produce eventually a new generation of workers. Certainly we're finding now that our students take like ducks to water to this sort of thing and don't regard it as being outlandish at all, in fact they are rather astonished that people don't use these methods rather more, but now that computers of course are coming in people are everywhere being driven willy-nilly to use these things. However it's only the prepared mind that can make good use, and we hope we've introduced a number of prepared minds into the community that will in fact be enabling people to make very much better use of these analytic facilities, in diagnosis, in assessment, in training and in the provision of hearing aids for the deaf and in methods of helping the language-retarded child, for example, that would otherwise be quite impossible to get off the ground.

Bridgman: Just the sort of course in fact that you would have leapt at in 1947 if it had been available!

Fourcin: Yes, yes, if there'd been that environment it would have been -

Bridgman: I wonder if, as we've just got a few minutes left, you could now sort of look back over that period and perhaps pull out a few highlights, people and developments, that have occurred in that 40 years.

Fourcin: Yes, I think that you can roughly divide developments, as it were, broadly into two categories: those which are have real practical value - and Atal's development of so-called linear predictive coding is a landmark in that respect. It is an approach which doesn't theoretically satisfy the needs of an adequate representation of the speech signal, but in practice it does a marvellous job and it separates source and excitation into two components which are of enormous engineering value.

The other, I suppose, tremendous landmark has been in regard to the enhanced understanding of - it's still only a beginning, but what the hearing mechanism is doing, and the ideas that already Pumfrey and Gold were introducing when I was a student at Imperial College in respect of the non-linear enhancement of frequency selectivity by the - well they didn't necessarily know it was the efferent system, but they knew something was going on by a sort of multiplication of the auditory filters. It's a crucial concept and we have more and more the beginnings of an insight into how in the future it's going to be possible to develop technology which can match signal analytic techniques to what the auditory system does and make use of the constraints of the productive mechanism, so that higher-order constraints can be associated in terms of the phonotactics, the linguistic structures, with more adequate front ends to produce a new generation, which probably won't be effective until the next century, of interactive speaking and hearing devices, getting right back to what were the aims of de Kempelen and of course of Wheatstone and Willis right at the beginning of the last century, and later on Ohm and Helmholtz, so that we can have what are going to be of course short-term practical developments in regard to speech recognition and speech synthesis and interpreting telephony and interactive, pseudo apparently intelligent systems, really beginning to behave in a natural way which is tuned to the human condition, so that people can converse and interact with computers, in a way making use of these advances which is really not suspected at the present time.

Bridgman: So you think this is a realistic goal - the talking, natural speech understanding computer.

Fourcin: Well, we are ourselves talking, speech understanding computers, and obviously we have many millions of years advance on technology in regard to evolutionary development, but the likelihood is that there will be not the replacement of the human being but its simulation to a much greater degree of effectiveness than is possible at the present time by the beginning of the next century, and already in the next 10 years there will be very large advances in this regard, to such a degree that the possibility is that in the next century we'll be getting back to oral traditions in a way which would seem almost unthinkable at the present time with our text-bound approaches to the representation of the external world, to such an extent that we have text processing described as natural language!

Bridgman: So it's almost as if the talking book will be the thing of the future, rather than the talking word processor, which is a bit of a contradiction in terms.

Fourcin: Yes, there are of course practical advantages in regard to the ability visually to scan the printed page, and one can absorb information visually by sampling far more rapidly than is possible with listening, so it's quite likely that we shall never supplant - there is always this process of evolution in which advances from one area of development are incorporated with advances in another. So I think it very unlikely that we won't - that we'll not have text or writing as a method of communication. It's just that speech will become so much more important as a vehicle for interaction with devices in the external world and as a means of access. For example, when you want to go through the 3000 items you were filing in the past six months you'll be able to access this, instead of through a keyboard, with speech, and get interaction and a degree of cognitive exchange in the future - that may be after your time, but your successors will!

Bridgman: What do you think the biggest obstacle is to this becoming a reality? Is it in speech, language, hearing, the technology?

Fourcin: I think one of the biggest obstacles is in the minds of men, actually: the minds of administrators. We have the difficulty that the way funding is contrived is in terms of the attainment of short-term practical goals, and so very large amounts of research money are directed - misdirected - towards the achievement of results which are of short-term consequence to satisfy political and administrative masters, instead of having what is the best and most cost-effective investment of human resource and material, money and so on, which would be directed towards a steady attack at fundamental problems. One area which would be of enormous value is in regard to the deeper understanding of the way it is that the child develops speaking ability. If you go along to Brussels - I've been spending a lot of time - I haven't said anything about the SAM project that I've been concerned in managing for the past five years, which has proved to be a European success and is the basis for international discussion and so on now, but we're all the time working on the basis of short-term expedients with short-term funding to achieve long-term results, and on a broader and much more important scale the people in speech technology who are really concerned with the development of complete systems have exactly that disadvantage. There could be much better advance if we had better administrative mechanisms. You didn't expect me - I think you thought that I might say, well, we need to have this insight into the way that the vocal tract interacts with the vocal folds, or -

Bridgman: Well, just to conclude, Professor Fourcin, could you just say in a few words what you think the significance of work in speech is, and perhaps you could sum up your career in a few words, if that's not an impossible thing to ask.

Fourcin: Yes, I - to take your last question first, as you know I'm due to retire in the next calendar year, and I don't really feel that my career is coming to an end but rather that it's beginning, which may seem rather odd. I don't know if I can contrive this, but what I hope once more is for a breath of freedom, because one very soon gets enmeshed in obligations to other people. Maybe I've taken this too seriously in my life, doing things and making tools and giving problems to students and setting up facilities, always trying to do things for other people. My own research, I think, has suffered considerably, although maybe my research ideas, the fundamental notions, get propagated. So what I hope is that I can be, in fact, doing more important things experimentally in the next few years than I have been able to do for a long time - since I was at Haskins almost, certainly perhaps since I was, perhaps more appropriately since I was at the beginning of my stay in University College.

Now with regard to, then, highlights, I certainly derive some satisfaction from seeing things that, when I first thought of them, perhaps very few other people had in mind that they were important being now accepted and applied. And it's a great pleasure to me to sit at the back of a lecture theatre and hear people going through things now, who are taking over from me, talking about and using structures and methods of explanation that it took me a long time to work out, and to walk through a laboratory and see students working on things that I've had a small contribution towards fashioning, which enable them to get a better insight into human communication.

But the, I suppose, main thing I would consider to be of outstanding consequence is in regard to human communication itself. There is a great pleasure to be obtained from getting an insight into the human condition and in working with people and with deaf children and on those rare occasions where, for example, a woman has not been able to speak for a year or whatever it is, and comes and works with the display that I've worked on - with a speech therapist, who is responsible for the work totally - but gets her voice after half an hour that she's totally lost. All sorts of little things which maybe are not of any great scientific significance - if you hear a woman saying she's not been able to, until now, hear the voice of her children, and she didn't realise they spoke so posh - that's odd little things are finally I suppose of greater consequence than many others.

The, I suppose, greatest feeling I have, looking back, is with regard to the changes in social attitudes to these matters, because we are in Britain, I believe, becoming much more competitive and much less inclined to help each other, and it does appear to me that if one could have greater satisfaction given to young people in helping each other and exploring methods of, as it were, understanding what it is to be a human being it would be much better for people in general. But that's a much broader answer to the question than I suppose you had it in mind.

Bridgman: Well, I suppose it's speech that makes us human, but it's also speech that divides one tribe from another and causes friction.

Fourcin: Yes, there's been an obvious evolutionary advantage in the development of language and, within language, dialect and accent differences, and there is this continual need for struggle to survive. We've got the situation in Europe now where there is an increasing central tendency towards uniformity. The sort of triploid Coxes and Bramley's apples are not to be allowed, and you're supposed to have one currency and so on. Now these are all things which are of commercial advantage, but the richness of the human condition is one which has to be fostered. It would be an immense catastrophe if English indeed did dominate the whole of Europe and if the accent, dialect and language forms of all of the different countries were not encouraged.

Now the thing about speech technology is that it would in fact make it possible for the different languages of Europe to be properly serviced in a way which would not be possible if we just have textbound systems, because as soon as you have computational processing operating effectively you have the means, in fact, eventually for interpreting systems, so that you speak in at one end in German and out it comes in French at the other.

Bridgman: So you can visualise a sort of tourist's hearing aid which simply translates on the hoof.

Fourcin: You could do, yes, yes, it would be perfectly feasible to have, at least in the short term, fairly simple things which operated in that way, with a limited number of phrases, and in due course with much more complex interchanges. And finally, which is perhaps most important, these methods would enable people to learn others' languages far more effectively than is presently possible. Now, you spoke before we came in here about the pianist who puts his hands on the piano and lets his fingers play the tune, and there are levels of learning which we don't understand, but part of the process of facilitating those levels of learning involves having adequate methods of cognitive representation.

Once we get deeper insight into the processes of cognitive representation then we shall get deeper insight into the methods of developing and fostering the, in this particular instance, language learning facilities so that it might well be possible not for the rare individual to speak half a dozen languages but for the run-of-the-mill individual, and this would be an enormous advance in regard to enhancing human understanding because, as you've said, speech strikes at the human condition. If it's possible for us to make speech forms in different languages more widely available, we also make the world views of different communities that much more accessible, because learning a language is not just learning a method of communication - it's learning how to understand and interact at a much deeper level than just the surface forms of speech. But speech is the way in, it's the first level so to speak.

Bridgman: Well, thank you very much for confirming my untutored opinion of the importance of speech, and for reviewing the past and for looking into the future, with a bit of important philosophy thrown in as well for good measure, I think. And I think we'll have to leave it there, but this has certainly been a tremendous insight into the history and the future of speech. Thank you very much, Professor Fourcin.

	SSSHP Contents \| Labs \| Abbr. \| Index

Smithsonian Speech Synthesis History Project
National Museum of American History \| Archives Center
Smithsonian Institution \| Privacy \| Terms of Use