Tran. Index | SSSHP Contents | Labs | Abbr. | Index | Page- | Page+
 
Transcription of Recordings - p. 5

SSSHP 116 "BBN Phonetic Vocoder"

SOURCE: Donated by Richard M. Schwartz, BBN Laboratories Inc., 
10 Moulton St., Cambridge, MA 02238, May 8, 1985. 5" reel, 
7 1/2 ips.

CONTENTS: Examples of diphone synthesis (output of the vocoder 
with phoneme errors corrected by hand) and automatic operation 
as a vocoder. The "Rainbow Passage":

1. Phonetic synthesis

2. Phonetic Vocoder

3. Phonetic Vocoder vs Phonetic Synthesis

4. Phonetic Synthesis vs original

END:  **********



SSSHP 117 "IBM Diphone Speech Synthesis, WALRUS System, circa 1984." 

SOURCE: Donated by Clifford A. Pickover, IBM Thomas J. Watson 
Research Center, Yorktown Heights, NY 10598, April 20, 2001.
Cassette.

CONTENTS: Vocal output of the experimental IBM speech synthesis
system called "WALRUS", early 1980s.

1. Experiment 1, "To be or not to be" - progressively adding
prosody, nasalization, fricatives, etc.

2. Experiment 2, "My name is David" - variations in voice-source.

3. Experiment 3, Hallelujah chorus - variations.

4. "I Left My Heart in San Francisco." Song by unidentified speech 
   synthesizer. (Not IBM work. Do not use without permission.)

END:  **********



SSSHP 118 "Voice Output From Computers, Course 430, Integrated 
           Computers Systems, 1980."

SOURCE: Survey tape for commercial course. Narrated by George
Papcun, Computalker Consultants, Santa Monica, CA. Used with
the permission of Learning Tree International, Los Angeles, CA.
(See letter of permission in SSSHP USA Computalker Consultants
file.)

CONTENTS: 

(Musical Introduction)

1. Computalker: "Hello, I'm Computalker, a speech synthesizer."

2. Texas Instruments' Speak & Spell: "Spell 'money'" ... 
   "M-O-N-E-Y. That is correct, now spell 'was', ...", several 
   phrases.

3. PCM coding, "Should we chase those young outlaw cowboys?",
   various quantization, various rates.

4. BTL Channel Vocoder - 2400 bits/sec
   "The Franklin Institute speaking through the voice of a Bell
   Laboratories vocoder, invites... PA1-9103."

5. Mountain Hardware Supertalker - Delta Modulation examples
   "... speaking at _ kilobaud", ... "This phrase is being re-
   corded at _ bits per second."

6. Building up an utterance from its components (Source not 
   specified):  "Please say what this word is."
      a) All parameters
      b) F0 steady
      c) F1 added
      d) F2 added
      e) F3 added
      f) Frication noises only
      g) All parameters except F0
      h) Total with F0 contour

7. Cecil Coker's Articulatory Synthesis (Bell Telephone Labs)"
   "Good morning. I am a computer. I can read stories and speak
   them aloud. ... information by telephone."

8. Computalker: Formant synthesizer parameter patterns for
   vowels, diphthongs, syllables.

9. Haskins Laboratories: Formant synthesis by rule from phonetic'
   input, 130-220 words per minute.
      a) "The North Wind and the Sun ... stronger of the two."
      b) Animal Communication, by Mark Twain.

10. Microspeech, Inc.: Formant synthesis from phonetic text.
      a) "Hello, I'm Microspeech, a speech synthesizer ... plug 
         into a standard bus ..."
      b) faster
      c) slower
   
11. (BTL?) Bandlimiting and quantizing formant and F0 parameters:
    "We were away a year ago."

12. LPC-encoded concatenated words with simple prosody rules
    (paragraph on predicative logic)

13. BTL Demisyllable Synthesis - Cathy Browman
    "Hello, I am a language interpreter named Lingua. I was
    developed at Bell Laboratories by Cathy Browman and ...
    more by myself soon."

14. Votrax Type-n-Talk - text to speech:
    "Hello. I am a Votrax speech synthesizer. The field of speech
    communication has shown rapid growth in the last few years.
    ... information."

15. Telesensory Systems, Inc. - formant synthesis
    "The purpose of this recording is to demonstrate the quality
    of synthetic speech obtained using the new set of large scale
    ... (long passage) ... Palo Alto, California."

16. MIT, John Allen: Discussion on synthesis by rule from text.

17. MIT MITalk-79: 
    "Speech is so familiar a feature of daily life that we rarely
    pause to define it. It seems as natural ... (long passage)...
    than any other."

END:  **********



SSSHP 119 "Lawrence - PAT / U of M, 12-15-61" 

SOURCE: Copy dated 12/15/61. Tape from the Univ. of Michigan 
        Communication Science Laboratory. 7" reel, 7.5 ips, good
        quality. (Maxey Tape T61.1) Referenced in SSSHP under each 
        laboratory's name. Almost the same contents as SSSHP 120.

CONTENTS: 

1) Lecture: "The use of synthetic speech as an aid to the study 
of speech apprehension," tape-recorded by Walter Lawrence, SRDE, 
England. Place and date unknown. (Better recording than in SSSHP
120.

   Lawrence: "When the auditory sensation of speech is experienced,
   a number of different kinds of information may be apprehended...
   (description of the synthesizer PAT's 6-controls: A0, F0, Ah, 
   F1, F2, F3. A fixed F4. Six contours of control patterns on 
   glass slide, scanned by CRT flying-spot scanner.)

   PAT: "What did you say before that?" variations: speed, pitch, 
   singing, whispered, child's voice... Good bye, come again soon.
   Good bye."

2) VODER, 1939, Homer Dudley, BTL.

   Syn: "She likes ..."; "Don't tell the boys."; "The Judge paid
   the check."

3) VOCODER, 1939, Homer Dudley, BTL.
   
   Processed human speech: "Mary had a little lamb. It's fleece..."
   Silly Willy commercial, voice variations.

4) EVT - Electronic Vocal Tract, MIT.

   Syn: vowels: "ee, ih, eh, ae, uh, ah" 

5) POVO - POle VOice analog synthesizer, MIT.

   Syn: "Where are you?", variations; "Far, far away." variations.

6) Diphasic speech simulation.

   Human: "List 5, diphasic speech simulation...speaker is Ruth 
   Hobbs."; 
   Processed human speech: "The 1st word is ..."; 
   Human: "List 6, ...";
   Processed human speech: ... "choose."

7) PAT - excerpts from Lawrence lecture (entry 1, above.)

   Syn: "What did you say before that?", variations in pitch,
   singing, whispered, child's voice...... Goodbye and come again
   soon."

8) Phoneme Splicing of human speech - Univ. of Michigan.

   "Mary Lou measures her father's nice long eyelashes every nine
   years." (x2 + original).

   "This is a test tone technique." (x2)
   "This is a team."   (x3)
   "This is a scream." (x3)
   "That's a can[?]."  (x3)

END:  **********



SSSHP 120 "Univ. of Michigan - Examples"

SOURCE: Tapes SSSHP 119 and 120 were obtained by IBM from the 
Univ. of Michigan Communication Sciences Laboratory, December, 
1961. Referenced in SSSHP under each laboratory's name. 7" reel,
7.5 ips, fair quality. (Maxey Tape T61.2) Almost the same contents 
as SSSHP 119.

CONTENTS:

1) Phoneme Splicing of human speech - Univ. of Michigan.

   "Mary Lou measures her father's nice long eyelashes every nine
   years." (x2 + original).

2) Dyad Splicing of human speech - Univ. of Michigan. (Not in
   SSSHP 119.)

   "We hastened the boy off the garage path to see which edge
   young owls could view." (x3 + original)

3) VODER, 1939, Homer Dudley, BTL.

   Syn: "The Judge paid the check." (SSSHP 119 has two more
   samples.)

4) VOCODER, 1939, Homer Dudley, BTL.
   
   Processed human speech: "Mary had a little lamb. It's fleece..."
   Silly Willy commercial, voice variations.

5) EVT - Electronic Vocal Tract, MIT.

   Syn: vowels: "ee, ih, eh, ae, uh, ah" 

6) POVO - POle VOice analog synthesizer, MIT.

   Syn: "Where are you?", variations; "Far, far away." variations.

7) Diphasic speech simulation.

   Human: "List 5, diphasic speech simulation...speaker is Ruth 
   Hobbs."; 
   Processed human speech: "The 1st word is ..."; 
   Human: "List 6, ...";
   Processed human speech: ... "choose."

8) PAT - excerpts from Lawrence lecture.

   Syn: "What did you say before that?", variations in pitch,
   singing, whispered, child's voice... Goodbye and come again
   soon."

9) Phoneme Splicing of human speech - Univ. of Michigan.

   "Mary Lou measures her father's nice long eyelashes every nine
   years." (x2 + original).

10) Lecture: "The use of synthetic speech as an aid to the study 
of speech apprehension," tape-recorded by Walter Lawrence, SRDE, 
England. Place and date unknown. Recording not as good as in SSSHP
119, but contains Lawrence's statement of the title.

   Lawrence: "Lecture demonstration entitled: 'The use of synthetic
   speech as an aid to the study of speech apprehension', by Walter
   Lawrence. When the auditory sensation of speech is experienced,
   a number of different kinds of information may be apprehended...

END:  **********



SSSHP 121 "Dr. Delattre's talk on cues, 11/17/61," Parts 1 and 2, 
          (Maxey Tape T61.3), 2-hours, and

SSSHP 122 "Dr. Delattre's talk on cues, 11/17/61," Parts 3 and 4, 
          (Maxey Tape T61.4), 1-hr and 27-min.

SOURCE: Tapes SSSHP 121 and 122 were recorded by IBM at IBM Re-
search, San Jose, CA, when Pierre Delattre was a paid consultant 
to IBM. 7" reel, 3.75 ips, see SSSHP USA Haskins Laboratories. 

CONTENTS: Detailed lecture on acoustic cues and the locii-theory
of formant transitions.

END:  **********



SSSHP 123 "BTL Demo Record/Hamlet, Daisy."

SOURCE: Tape made from 33 1/3 rpm flexible disk labeled "Synthe-
sized Speech". Work of J.L. Kelly, Jr. and L.J. Gerstman of Bell 
Telephone Labs, 1961. 4" reel, 7.5 ips. (Maxey Tape T61.5)

CONTENTS:

   Syn: "Hello, Ladies and Gentlemen."
   Human: narration about system.
   Syn: "He saw the cat."; "Mr. Watson, come here."; Hamlet, "To
   be or not to be..."; Bicycle Built for Two, "Daisy, Daisy, give
   me your answer...", with piano; 34 speech sounds and rules;
   copy of human speech spectrum, "Men strive...", "If you don't 
   mind... Texas."; "Thanks for listening."

END:  **********



SSSHP 124 "Prof. Delattre's Pattern Playback Examples, 8/4/61".

SOURCE: Recorded for IBM at Pierre Delattre's laboratory, Univ. 
of Colorado at Boulder. 4" reel, 7.5 ips, see SSSHP USA Haskins 
Laboratories. (Maxey Tape T61.6)

CONTENTS:

1) Single harmonics from Pattern Playback (range of tones).
2) "i, u, ae" - 3 formants, 2 tones (harmonics) each.
3) "i, u, ae" - 2 formants, one tone in each.
4) "bae, dae, gae, shae, sae" - 3 formants, 2 tones in 1st and
   2nd formant, 1 tone in 3rd formant, dots for "sh, s" noise.
5) "My name is Brownie" - 3 formants with 1 to 3 tones in
   1st and 2nd formant, only 1 tone in 3rd formant.

6) (repeat of samples 1 to 5)

7) "An old Arab ate an apple" - (a) American accent, (b) German
   accent, (c) Spanish accent, (d) French accent, (e) American
   accent.

8) (repeat of samples 7)

END:  **********



SSSHP 125 "SCS - 1962/Stockholm Vocoder Demo Tape Sept. 1962". 

SOURCE: Produced by the Speech Transmission Lab., Royal Institute 
of Technology, Stockholm, Sweden. Distributed at the Stockholm 
Speech Communication Seminar. First-generation copy. 7" reel, 
7.5 ips. (Maxey Tape T62.1)

CONTENTS:

Human speech source for subsequent vocoder processing or synthe-
sis. (M)-male, (F)-female:

   "Welcome to the Stockholm Speech Communication Seminar." (M)
   (23 CV's, female, then male)
   "Hello." (F)
   "Hello, is Docent Fant there?" (M)
   "No, he isn't, he is out right now. Can I take a message? (F)
   "Yes, please. Would you ask him to call 23 65 20. (M)

1) to 8) vocoders (not transcribed)

9) Synthesis from function generator: PAT, Phonetics Dept., 
   Edinburgh Univ., Scotland, J.K.F. Anthony. 
   
   (syn: above conversation)

10) Synthesis from function generator: OVE-II, Speech Trans-
    mission Laboratory, Stockholm, Sweden, G. Fant.

    (syn: above conversation)

11) to 12) vocoders (not transcribed)

END:  **********



SSSHP 126 "Stockholm Vocoder Demo. Tape, STL/RIT, Sept. 1962"

SOURCE: Copy of SSSHP 125, 7" reel, 7.5 ips. (Maxey Tape T62.2)

END:  **********



SSSHP 127 "Machines That Talk, MIT, 2/62"

SOURCE: Recorded at Research Laboratory of Electronics, Massa-
chusetts Institute of Technology, Feb 2, 1962, 7" reel, 7.5 ips. 
(Maxey Tape T62.3)

CONTENTS: Same as SSSHP 90, with the following additional material:

10a. DAVO, MIT (cont).
 
   (syn, segments and assembled phrases by tape splicing: 
   "A, B, C, D, ..." song; "This is the voice of DAVO at MIT";
   "Tech is Hell")

END:  **********



SSSHP 128 "Univ. of Mich. and MIT Demo Tapes, 1961". 
          (Maxey Tape T62.4)

SOURCE: Copy of SSSHP 120 (T61.2) and SSSHP 127 (T62.3), 7" reel,
7.5 ips. (?Copy of "T61.2" isn't in the same order of the original.)

END:  **********



SSSHP 129 "P.A.T. Demo 1962", E.T. Uldall, May, 1962.

SOURCE: Received from E.T. Uldall. See SSSHP UK Edinburgh Univ. 
5" reel, 7.5 ips, good quality, 1st gen copy. (Maxey Tape T62.5)

CONTENTS: Lecture on PAT by E.T. Uldall. Transcription in SSSHP 
UK Edinburgh file. 

   (syn: CVs; "The sun was the stronger of the two.", with 
   variations; The North Wind and the Sun - Mark IV Complete.)

END:  **********



SSSHP 130 "P.G.E.C. Demonstration", IBM

SOURCE: IBM Diphone synthesis tape, 4" reel, 7.5 ips. (Maxey 
Tape T63.3)

CONTENTS:

   (syn: "Which is his hat? That is his hat. But, Bill sat on it.
   That is too bad. Bill is a nit wit. To be or not to be. An
   eye for an eye.")

END:  **********



SSSHP 131 "We wish you a merry Xmas", IBM.

SOURCE: IBM demonstration tape, 3" reel, 7.5 ips

CONTENTS: 

   (syn: song in 1, 2, and 3-part harmony; 2-channel masters) 
    Master tape. (Maxey Tape T65.2)

END:  **********



SSSHP 132 "Archival Tape - H.D. Maxey, 3/65"

SOURCE: IBM copy of other tapes, 7" reel, 7.5 ips. (Maxey Tape 
T65.6)

CONTENTS:

   1. Delattre's Pattern Playback, Aug 4, 1961
      ("My name is...; "An old Arab...") See Haskins Labs
   2. Edinburgh Univ., E. Uldall, May 1962
      (PAT lecture; "The North Wind and the Sun...")
   3. OVE-II, RIT, Sep 1962
      ("Hello is Docent Fant...")
   4. Demo diskette, Gerstman & Kelly, BTL, 1961
      ("He saw the cat. Hamlet, Daisy song, T61.5)
   5. JSRU Demo, J. Holmes, Mar 1965
      ("A bird in the hand...")
   6. EVA, Melpar, Oct 1964
      ("This voice is the result...)

END:  **********



SSSHP 133 "Speech Synthesis by Rule - JSRU England, 3/65".

SOURCE: Copy of IBM's copy. 3" reel, 7.5 ips

CONTENTS: Demo for "Speech synthesis by rule," Language & Speech, 
          1964. (Maxey Tape T65.7) See SSSHP UK JSRU. See SSSHP 
          77.2 for contents.
      
END:  **********



SSSHP 134 "Diphone Synthesis Demo, 4/66", IBM.

SOURCE: Synthetic speech samples from diagnostic intelligibility 
tests in 1965. Presented with paper, "Some effects of list fami-
liarity and synthetic segment familiarity on the intelligibility 
of PB-words produced by diphone synthesis,"  N. R. Dixon, H. D. 
Maxey and R. Aylsworth, 70th meeting of the Acoust. Soc. Amer., 
St. Louis, MO, Nov. 3-6, 1965. 4" reel, 7.5 ips. MASTER COPY. 
(Maxey Tape T66.7) 

CONTENTS: Narration by N.R. Dixon.

1. Introduction

2. "Inventory Number 901, Series A."  (analog synthesis by hand)

3. Definition of diphone segments

4. Description of training words

5. Examples:

   a) "Yaa, as in yarn and yacht."
   b) "Twi, as in twist and twill."
   c) "Ing, as in zing and sing."
   d) "Whuh, as in why and white."

7. Description of CID test material

8. Examples:

   a) "Say: there."
   b) "Say: east."
   c) "Say: aisle."
   d) "Say: none."
   e) "Say: wire."

END:  **********



SSSHP 135 "Diphone Demo MASTER III, Oct '67", IBM.

SOURCE: Demo for Dixon, N.R., and H.D. Maxey, "Terminal analog 
synthesis of continuous speech using the diphone method of segment 
assembly", IEEE Trans.  Audio and Electro., AU-16, 40-50 (1968). 
4" reel, 7.5 ips.(Maxey Tape T67.5)

CONTENTS:

   Introduction - narration by N.R. Dixon 
   (syn: "Inventory No. 901, Series A.", control variations)
   Diphone description
   (syn 10 words: "office, being, consult, ... W")
   Diphone Assembly
   (syn 7 sen: "The number you dialed ... and dial again.")

END:  **********



SSSHP 136 "Speech Analysis/Synthesis Survey Tape", AFCRL, 1967.

SOURCE: Backup copy of SSSHP 81. See SSSHP USA AFCRL. 7" reel, 
7.5 ips. (Maxey Tape T68.2)

END:  **********



SSSHP 137 "Sleeping Beauty", E. Matsui, et.al., ETL, Aug 1968.

SOURCE: From N. Omeda, ETL, Japan. Copy of Prof. F.F. Lee's copy 
at MIT. 3" reel, 7.5 ips. (Maxey Tape T68.6)
           . 
CONTENTS: Grimm's fairy tale "Sleeping Beauty." 

END:  **********



SSSHP 138 "Synthesis Samples, Japanese", IBM.

SOURCE: IBM diphone synthesis, N.R. Dixon and H.D. Maxey, 3" reel, 
        7.5 ips. (Maxey Tape T68.10)

CONTENTS:

   "Ohayoo gozaimasu."   
   "Dr. Miura, irrassahimase."
   "Ogenki desuka?"
   "Sock it to me, Dave."
   "An heah come de Judge."

END:  **********



SSSHP 139 "Boone Demo, ASA of NC, Boone, NC", IBM, H.D. Maxey, 
          Oct 3, 1969.

SOURCE: IBM diphone synthesis, 4" reel, 7.5 ips (Maxey Tape T69.3)

CONTENTS:

   Syn: "AAIX", vars; BXAAIX;
   "Speech Processing Dept..." (SPDEPT)
   "Spraken ze ..." (GERMAN2)
   Schlitz Beer song (SHLTZ3)
          
END:  **********



SSSHP 140 "IBM 7770, 1969".

SOURCE: Samples of vocabulary for IBM 7770-III for NY Bell Tele-
phone Co. Donated by IBM Corp. 5" reel, 7.5 ips. (Maxey Tape T69.6)

CONTENTS: Processed human speech for IBM 7770-III. See SSSHP USA 
          IBM file.

END:  **********



SSSHP 141 "Application Based Sample Speech of VS4, May 15, 1972." 

SOURCE: Commercial tape from Vocal Interface Group, Federal Screw
Group. Cassette. See SSSHP USA VOTRAX. Additional speech samples
by VS4. (Maxey Tape T72.1)

END:  **********



SSSHP 142 "The Sounds of Computalker, 1976," 

SOURCE: Computalker Consultants, commercial cassette. SSSHP has 
permission to use. (Maxey Tape T76.1)

CONTENTS: Male narration by Lloyd Rice, Computalker Consultants.
Fair quality, some distortion and noise.

1) Synthetic music.

2) Syn. from hand-edited control data from human utterance.

   (syn: "Hello, I'm Computalker, a speech synthesizer designed 
   to plug into the standard bus of your 8080 ...now hearing.";
   synthetic music; narration: synthesizer data rate.)

3) Synthesis by rule from phonetic spelling.

   (syn: Four score and seven years ago out fathers brought forth
   on this continent a new ...")

4) Hand editing of analysis of human utterances.

   (human and syn, Kennedy speech: "Let us never negotiate out of
   fear, but let us never fear to negotiate."; King speech: "Doors
   will be opening that have not been opened in the past to 
   Negros.")

5) Synthesis by rule from phonetic speech.

   (syn, Star Trek: "... transporter circuits.")

6)  repeat of 5) faster, slower.

7)    "     "    low pitch, high pitch.

8) Music from synthesizer.

END:  **********
   


SSSHP 143 "Dialogue Homme/Machine (Reconnaissance automa-
          tique et synthese par diphones), Recherches/Acoustique, 
          Vol. IV, 1977." 

SOURCE: CNET, Lannion, France. See SSSHP France CNET. (Maxey Tape 
T77.1)

CONTENTS: vocoders, not transcribed

END:  **********



SSSHP 144 "TSI Speech/Reading System Announcement, VC003T, 
          12/12/77."

SOURCE: Telesensory Systems, Inc., Palo Alto, CA. Cassette (Maxey 
Tape T77.2)

CONTENTS: Same as SSSHP 17. Leader broken and needs repair. 

END:  **********



SSSHP 145 Plastic Disk: "A Report on the Kurzweil Reading 
          Machine for the Blind," 

SOURCE: Kurzweil Computer Products. 33 1/3 rpm plastic disk accom-
panying The Kurzweil Report, No. 3, Spring 1979. See SSSHP USA 
Kurzweil file. (Maxey T79.1)

CONTENTS: 

   (syn: "A,B,C's; numbers; dollars; "O say can you see... 
    gleaming; "A horse is a horse... Mr. Ed"; "Four score and 
    seven years ago... equal.")

END:  **********



SSSHP 146 "Klatt-Talk KT-1 Demo, May 1980".

SOURCE: From Dennis Klatt, MIT to H.D. Maxey. Cassette. (Maxey 
        Tape T80.2) 

CONTENTS: Same as SSSHP 92.2. 
          
END:  **********



SSSHP 147 "Klatt Syn-by-Rule, Oct 1971, MIT".

SOURCE: Copy of IBM copy from Dennis H. Klatt, 7" reel, 7.5 ips. 
(Maxey Tape T80.3)

CONTENTS:  A better copy is SSSHP 91.5. 


END:  **********



SSSHP 148 "TSI Demo, J. Bernstein, 11/11/80".

SOURCE: J. Bernstein, Telesensory Systems, Inc., Cassette, 
(Maxey Tape T82.1)

CONTENTS: Copy of tape SSSHP 31.

END:  **********

       The above tapes SSSHP 119-148 were collected by H.D. 
       Maxey during IBM's diphone synthesis project, spanning 
       the years 1961 to 1985. The IBM-created tapes have been 
       released by IBM for use by the SSSHP by N.A. Dion (Jan 6, 
       1991) and A. Peled (Jan 13, 1993). See SSSHP USA IBM file.



SSSHP 150 "IBM Diphone Speech Synthesis (1961-1970), H.D.
          Maxey, May 2001."

SOURCE:  Donated by H. D. Maxey. See letter of permission from
Norman A. Dion, II, IBM Corporation, Jan 30, 1991 in SSSHP USA
IBM file. Cassette, 15:30 minutes.


CONTENTS: Compilation of IBM demonstration tapes.

This is the IBM Diphone Speech Synthesis demonstration tape, 1961-
1970, for the International Business Machines Corporation. My name 
is H. David Maxey and this is May 22, 2001. This tape was made for 
the SSSHP.

The first four entries were produced at the IBM Research Laboratory, 
San Jose, California, over the period 1961-1966.


Entry 1:

Three samples of speech synthesis from hand-prepared patterns on 
the multi-channel analog function generator, 1965 and 1966. The 
synthesizer's informal nickname was "Old Ironjaw", from its per-
ceived struggle to talk. The song, "We wish you a merry Christmas", 
is in 1, 2, and 3-part harmony, by using sound-on-sound re-recording. 
Here are the 3 samples, with narration by N.R. Dixon:

"Inventory 901..." (Ao +F1 +F2 +F3 +An +Ah +Fo)         SSSHP 135 
"Now this is Old Ironjaw talkin'"  (3 speeds)         
"We wish you ... Xmas..."                               SSSHP 131


Entry 2:

Synthetic speech examples referenced in the paper by Estes, Kerby, 
Maxey and Walker entitled: "Speech synthesis from stored data", 
IBM Journal of Research and Development, Vol. 8, No. 1, January 
1964, pp. 2-12. Seven sentences by diphone assembly, 1963:

"Which is his hat? ... eye for an eye."                 SSSHP 130.1


Entry 3:

Synthetic speech samples from diagnostic intelligibility tests in 1965.
Narration by N.R. Dixon.

Introduction                                             SSSHP 134
Definition of diphone segments
Description of training words
Example training phrases (5)
Description of CID test material
Example test phrases
Conclusion


Entry 4:

Speech synthesis demonstration to accompany paper by Dixon and 
Maxey entitled: "Terminal analog synthesis of continuous speech 
using the diphone method of segment assembly", IEEE Transactions 
on Audio and Electroacoustics, Vol. AU-16, No. 1, March 1968, 
pages 40-50.

The demonstration tape, narrated by N.R. Dixon, contains examples 
of phrase synthesis from the function generator, diphone words, 
and diphone sentences. 

"Inventory No. 901, Series A" (control variations)      SSSHP 135
10 words:"office, being, seven, consult, ... W"
7 sen: "The number you dialed ... and dial again."   


Entry 5:

Some phrases and a song from the IBM TASS-III speech synthesizer 
from the period 1967 - 1970 at Research Triangle Park, North 
Carolina. Synthesis by diphone-assembly using diphone library 5.

"Speech Processing Dept..."                             SSSHP 139
"Sock it to me, Dave."                                  SSSHP 138
"An heah come de Judge."                                   "   
Schlitz Beer song                                       SSSHP 139 

Here are some experiments with using the diphones to attempt 
Japanese and German languages. Most native speakers politely 
commented that it sounded like an American trying to speak their 
language.

"Ohayoo gozaimasu."                                      SSSHP 138
"Dr. Miura, irrassahimase."                                 "
"Ogenki desuka?"                                            "
"Spraken ze ..."                                         SSSHP 139

END:  **********



SSSHP 165 "Meet the Expert, 30 Jan 58, Dr. H.M. Truby"

SOURCE: 2nd generation copy of SSSHP 106, by H.D. Maxey, June 8, 
        2001. 5" reel, 3.75 ips. 

CONTENTS: See SSSHP 92.1.

END:  **********



SSSHP 166 "Eye on Research: The Six Parameters of PAT"

SOURCE: Second-generation videotape copy of film for B.B.C. broad-
cast series, circa 1958, donated by Prof. Bob Ladd, Dept. of Theo-
retical and Applied Linguistics, Univ. of Edinburgh, Scotland, 
June 21, 2001. VHS/PAL format, 30:08 min, fair quality.

CONTENTS: Demonstration of speech analysis and synthesis apparatus
of the Dept. of Phonetics, Univ. of Edinburgh, Scotland, circa 1958.

1) Walter Lawrence discussing the Edinburgh PAT speech synthesizer.

   (syn: "Eye on Research")

2) David Abercrombie, overview

3) Peter Ladefoged demonstrating:
   - using swallowed balloon for measuring lung air pressure
   - Laryngoscope for examining vocal folds in motion
   - Elizabeth Uldall's vocal fold movie
   - Palatography
   - Sound spectrograph
   - Artificial larynx

4) Demonstration of PAT by James Anthony, view of PAT in operation 
   a week after the fricative filter became tunable. 

   (syn: "Do you understand what I say to you?", male, female, 
    child voice.)

5) Walter Lawrence presenting six words by PAT for listeners to 
   identify and mail in on postcards.

6) David Abercrombie, summary, present limitations of PAT, goals 
   of Department of Phonetics.

END:  **********



SSSHP 167 "Eye on Research: The Six Parameters of PAT"

SOURCE: Copy of videotape SSSHP 166 in VHS/NTSC format. Donated
        by H. D. Maxey, June 23, 2001. Poor quality. 

END:  **********



SSSHP 171 "Cornell University speech synthesis samples"

SOURCE: Compiled by Bonnie Puckett on September 28, 2001 and 
donated by Susan R. Hertz, SpeechWorks International, Inc., Ithaca, 
NY, Jan 23, 2002. Cassette, good quality, 2:17 min, right-channel.


CONTENTS: Demonstration of the Cornell Speech Research System.

1a) Among the first words produced with SRS.  These words were 
played at the 95th Meeting of the Acoustical Society of America 
in Providence, Rhode Island.  

   (syn  "The Cornell Speech Research System") (x2)

1b) One of the first songs produced with SRS.  The song consists 
of two voices singing in harmony; each voice was recorded onto a 
separate track of the tape and played simultaneously.  The acoustic 
parameters produced by the SRS speech rules were hand edited to 
include appropriate values such as fundamental frequency and 
durations.  Because the OVE IIId speech synthesizer could only 
produce 64 discrete fundamental frequency values, the song is not 
perfectly in tune.

   (syn 0:10 "Far above Cayuga's waters, with its waves of blue, 
    stands our noble alma mater, glorious to view.")

1c) The first complete sentence produced with SRS.  The samples 
illustrate different F0 contours, including one with no F0 
variation (a monotone).  

   (syn "I like ice cream.  I like ice cream.  I like ice cream.")


2) SRS speaking in five different languages.  The sentences in the 
languages other than English were produced by Cornell students in 
a speech synthesis class.

   (syn 0:33 "Hello, I'm a machine.  I've been learning to speak 
   four different languages:  Japanese, German, Dutch and Spanish.  
   Listen:  Nihongo-ga hanasemasu.  Ich kann auch ein bisschen 
   Deutsch sprechen.  Ik kan zelfs een beetje Nederlands spreken, 
   y un poco de espaņol.")   


3) A paragraph produced by the SRS rules for American English. The 
text was entered as ordinary spelled out words with interspersed 
prosody annotations.

   (syn 0:31 "Today is August thirty first, nineteen eighty two.  
   This tape is being played at a multi-language synthesis and 
   recognition workshop in Paris.  The focus of this meeting is 
   on Black African languages.  There are hundreds, perhaps 
   thousands of African languages. Often these languages have no
   corresponding writing system.  Speech synthesis and recognition 
   may offer the speakers of these languages an alternative way to
   communicate with machines.")


4) A paragraph generated by the SRS rules for Japanese. The text 
was entered in a Romanization, with annotations marking phrase 
boundaries and unpredictable pitch accents, as described in the 
references. The transcription below is a more conventional 
transcription which can be used as a guide while listening to the 
selection. 

   (syn 0:13 "kyO'-wa; se'N kyU'-cyaku hatSidZU-sa'NneN; Si'gatsu 
   dZU-rokunitSi'-desu.  gakkai-saiSUbi-no go'go, otsukare-no 
   toko'ro-o kyOSuku-de'su-ga, o-cirune-o nasa 'tte-wa ikemase'N.") 
 
END:  **********



SSSHP 172 "Eloquent Technology, Inc. speech synthesis samples"

SOURCE: Compiled by Bonnie Puckett on September 28, 2001 and 
donated by Susan R. Hertz, SpeechWorks International, Inc., Ithaca, 
NY, Jan 23, 2002. Cassette, good quality, 5:46 min, right channel.


CONTENTS: Demonstration of the Eloquent Technology systems.

1) A synthetic rendition of a sentence produced in a dialect of 
American English spoken in Alexander City, Alabama.  The acoustic 
parameter values generated by a set of rules for General American 
English were hand-edited to closely model the original utterance 
while adhering to the principles of the phone-and-transition model.  

   (syn "The colors of the rainbow are red, orange, yellow, green, 
   blue and violet.")


2) A synthetic sentence produced as part of our voice quality 
experimentation.  Certain acoustic parameter values generated by 
a set of formant-based rules for General American English were 
hand-edited in accordance with our hypotheses about factors 
important for producing human-sounding speech.  

   (syn 0:01 "Today's a spectacular day.")


3) Demonstration of ETI-Eloquence 6.0 formant-based text-to-speech 
product speaking in the adult male voice in 10 languages/dialects.  
The input text contained annotations to switch from one language 
to another. 

   (syn 1:46 "Hi, my name is Reed.  I'm an American.  I speak 
   English ... \Vce=Speaker=Antti\ Hei.  Nimeni on Antti.  Olen 
   suomalainen.  Puhutko suomea?  1, 2, 3, 4, 5, 6, 7, 8, 9, 10.")  


4) Demonstration of some of the capabilities of the ETI-Eloquence 
6.0 formant-based text-to-speech product for US English.  The 
demonstration highlights the output produced both from ordinary 
input text, and from text which has been annotated to produce a 
variety of voices, voice characteristics, speaking rates, and word 
emphasis levels, as shown in the example below. In this example, 
the \Pau\ annotation specifies a  pause duration (in milliseconds), 
the \xWac\ (word accent) annotation specifies degree of word 
prominence, the \Vce=Speaker\ annotation selects one of the built-in 
voices, and the \xBth\ annotation specifies degree of breathiness.

   (syn 3:11 "Hello.  My name is Reed.  I believe we may have met 
   before.  Would you like to meet my sister Shelley? ... For 
   example, users can tell us to emphasize a specific word in a 
   sentence. Suppose another machine claimed to be the best talking
   computer. I could then respond, \Pau=300\ no, \Pau=250\ \xWac=3\ 
   I'm the best talking computer ... \Vce=Speaker=Shelley\ \xBth=100\ 
   And this is Shelley whispering so she can tell you a secret ... 
   The possibilities are endless, but you've probably heard enough 
   for now.  Goodbye, and thanks for listening.")  

END:  **********

Tran. Index | SSSHP Contents | Labs | Abbr. | Index | Page- | Page+

Smithsonian Speech Synthesis History Project
National Museum of American History | Archives Center
Smithsonian Institution | Privacy | Terms of Use