RULES FOR DIPHONE-ASSEMBLED SENTENCES IBM TASS-III Diphone Library 5B of November 11, 1970 The base reference for speech synthesis by diphone-segment assem- bly is the publication: Dixon, N.R., and H.D. Maxey, "Terminal analog synthesis of continuous speech using the diphone method of segment assembly", IEEE Trans. Audio and Electro., AU-16, 40-50 (1968) which discusses, by example, the analysis, synthesis and assembly rationale for using diphone-segments in multiallophonic roles in continuous utterance. A chart of the alphabetic-character tran- scription system with IPA symbols is included. This 2001 document summarizes the diphone-assembly rules as of the last modification to Diphone Library 5B on November 11, 1970. The rules are based on N. R. Dixon's subsequent development of the diphone library and diphone characterization since the 1968 paper. ALPHABETIC-CHARACTER TRANSCRIPTION SYSTEM. For computer use, two alphabetic-characters were used for each phonetic entity. Consonants Consonants Consonants Vowels Dipthtongs As in As in As in As in As in PX - pan FX - fan HX - high EE - feet EHIX - say BX - ban VX - van HW - why IX - fit AAIX - sigh TX - tan TH - bath EH - fed AAUX - show DX - Dan DH - than WX - win AE - fad AWIX - soy KX - can SX - sip JX - yes ER - bird AWUX - so GX - Gus ZX - zip RX - rip UH - but SH - ship LX - lip AA - cod MX - man ZH - pleasure AW - bought NX - neck CH - chip XX - silence UX - book NG - ring DZ - jam QX - release UU - boot NC - generic nasal consonant SC - generic stop consonant DIPHONE NAMES 1. Diphone names may be from one to six characters in length, must start with an alphabetic character, and may contain combinations of alpha, numeric, and special symbols except the symbols , ) ( . / '. 2. The convention has been to form diphone names from the two-char- acter phonetic names with an attempt to make the names descriptive of the diphones' phonetic quality. The names, therefore, tend to be either two, four, or six characters in length. Examples and use: EE, RX - steady-state vowel or a standard diphone target TXEE - consonant/vowel transition as in "tea" TXWXIX - consonantal-complex/vowel transition as in "twins" 3. A diphone name followed by a digit indicates an entry of a special nature. a) An ending digit "2" has been reserved to indicate diphones having terminal and/or syllabic character. Ex: VXNX2 - as in the word "seven" KXNX2 - as in "reckon" and "connect" b) An ending digit "3" has been reserved to indicate 25-sample junctural diphones intended to be scaled in use. Ex: EHIX3, AWIX3, AWUH3, AEIX3, etc. The previous fixed-length diphthongs (EH15IX) were dropped. c) Digits other than "2" or "3" have been used to indicate a temporary variant of a diphone so that both can be used in the same computer run. PHRASE NAMES Some of the Library 5 entries are whole words or phrases, with the names being suggestive of the contents of the phrase. Phrase names may be from one to six characters in length, must start with an alphabetic character, and may contain combinations of alpha, numeric, and special symbols except the symbols , ) ( . / '. As phrase names are not diphone names, any ending digit may differen- tiate between similar phrases. DIPHONE SPELLINGS OF SENTENCES Diphone-spelling of sentences evolved from a punched card format to an IBM 2250 graphic display, and now, for the SSSHP, to an ASCII file for an IBM Personal Computer. All these formats have kept the same fixed presentation of four diphone specifications per line, as shown in the following example. *RECORDNG 082770 "This is a synthetic recording." D2 XXDH(085- ) D3 DHIX ( -085) 05 IXSX R -080) 04 SXIX ( -105) 5M2 IXZXR -065) 03 ZXXX ( -085) 05 XXUH ( -070) D2 UHSX R - ) ( - ) ( - )5D2 SXIX ( - ) D2 IXNX ( - ) 02 NXTH( -070)4D2 THEH ( - ) EHDX (100- ) D3 TXIX ( - ) IXKX( -055) 10 XXRX (065- ) RXIX ( - ) D2 IXKX ( -065) 05 KXAW( -095) 02 AWRX ( - ) D2 RXDX ( -070)2D2 DXIX (065- ) 02 IXNG( -060)3M2 NCXX ( -055) ( - ) ( - ) The fixed format, while not essential to synthesis, has proven helpful in improving readability and reducing specification errors. The first column is reserved for an "*", to mark identification or comment lines. The first comment line contains up to an 8-character name and a creation date in the first 16 columns. Columns 19 to 73 contain descriptive comments. If the first column of a non-blank line is blank, the line can contain up to four diphone "commands". The numbers within parentheses are fundamental frequency (F0) speci- fications (see below.) Each of these 18-character diphone commands can be in one of the following formants, where "n" and "m" are digits: Command Description 1. ABCDEF,( - ) A diphone name of up to 6 characters and comma, right adjusted. The stored segment (usually a word or phrase) is sent to the synthesizer "as is", with stored F0 pattern. 2. nn IXNG ( - ) Up to a 2-digit number (nn), in the first 3 columns followed by the diphone name in the next 6 columns. The computer duplicates the diphone's first sample nn-times. 3. mDn THEH ( - ) A 2- or 3-character code (Dn, mDn), right- Dn THEN ( - ) adjusted in the first three columns, followed by a diphone name. The computer duplicates the first sample of the diphone m-times. The character "D" instructs the computer to Divide the diphone by the following number "n" by taking only every nth sample of the diphone, starting with the first sample. 4. mMn NCXX ( - ) A 2- or 3-character code (Mn, mMn), right- Mn NCXX ( - ) adjusted in the first three columns, followed by a diphone name. The computer duplicates the first sample of the diphone m-times. The character "M" instructs the computer to Multi- ply the diphone samples by the following number "n" by duplicating each of the diphone's samples n-times. 5. --- ---- (123-234) Any of the above commands with F0 contour specified. The computer modifies the diphone according to the first 3-character code, then constructs a linear F0 control function from the start of the modified diphone to the end. 6. R-- ---- ( - ) Any of the above commands with an initial char- acter of "R" which instructs the computer to find the "reverse diphone" (see below) and use it reversed in time. F0 BREAKPOINTS The fundamental frequency (F0) contour can be specified with a few breakpoints scattered throughout the sentence, as in the above exam- ple. The computer first lengthens or shortens the diphones as speci- fied, then interpolates values for FO at the beginnings and ends of the modified diphones. REVERSE DIPHONES During the synthesis procedure, the computer searches the diphone library for the requested segments. If a segment cannot be found, the computer searches for a "reverse diphone", before noting that a diphone cannot be found. A reverse diphone is one for the reverse phonetic sequence. If found, the control samples of the reverse diphone are accessed in reverse in time and are used as a substitute for the requested diphone (the same modifications are applied). The first parenthesis after the requested diphone is changed by the computer to an "R" to indicate that a reverse diphone has been sub- stituted. Library 5B has been stylized so that a number of diphones can be used in reverse, thereby reducing the size of the diphone library. The following diphone categories are missing from Library 5B, thereby requiring the reverse-diphone function. 1. No vowel-fricative diphones (e.g., EHZX, EHSX) 2. Only some terminating-diphones (e.g., DZXX, NGXX) 3. No vowel-WX or vowel-JX diphones (e.g., AAWX, AEJX) 4. No Stop-Consonant/Nasal-Consonant (SCNC). The generic diphone, NCSC, for all nasal consonant/stop consonant contexts, is used in reverse. 5. No vowel-HX diphones (e.g., AEHX) 6. Only some consonant/stop-consonant diphones (e.g., DZDX, LXDX) If a diphone cannot be found, the name of a possible reverse diphone is formed from the diphone's name according to the following rules: 1. If diphone name is 4-characters, the first and last pairs of characters are swapped. (e.g., EHZX => ZXEH) 2. If diphone name is 5-characters and last character is numeric, other than "2", the first two characters and second two charac- ters are swapped and the numeric character appended (e.g., ERAW3 => AWER3). N.R. Dixon & H.D. Maxey 2001 PC file: ss_ibmdr.txt