The hard facts
Gary Herman looks at three computer systems which can take the words off your hands as well as the music...
There are currently two available systems for microcomputer speech generation. The first uses actual words spoken by a real person stored in digitised form on a ROM chip. Such a chip is available for the BBC computer — produced by Acorn and employing the dulcet tones of newsreader Kenneth Kendall.
The advantages of this system are obvious: the chip contains a lexicon of common words which are readily accessed and reproduced in near-human tones. But, although it is possible to create new words by combining portions of the stored vocabulary, digitised systems are not very flexible and — except in the most expensive versions — still suffer from a touch of the 'Metal Mickeys', resulting from the techniques used to save memory space.
Allophone (or phoneme) systems, which are becoming increasingly popular, electronically synthesize units of spoken sound (something like phonetic syllables) and combine them to form words. Real human voices don't come into it at all!
The three devices reviewed below are all of the allophone type, built around the common SC-01 chip (or equivalent) which can generate 64 allophones (including pauses) accessed by six-bit codes formed from the six least significant characters of a byte. In the simplest of the reviewed systems — the William Stuart Chatterbox — the allophones are generated by entering the code in the form of a decimal number between 0 and 63 using a POKE statement or some equivalent output procedure. This involves a considerable amount of programming to get any sounds, but gives great flexibility. The other two devices — the Currah Microspeech and the top of the range Votrax Personal Speech System (from the makers of the SC-01) — include interpretive software which allows strings of words to be converted directly into speech.
The Chatterbox looks cheaper than its £50 price, it comes in an insubstantial black plastic box with an unprotected loudspeaker mounted on the front and a lead which connects to your computer's input/output or expansion port squeezing out of the back. William Stuart provide leads to fit a number of popular computers — the documentation also gives pinout information so you can wire your own interface. On top of the unit is a DIN socket for inputting simultaneous audio signals, output sockets for connection to a loudspeaker or amplifier and a DIN socket to interface the Chatterbox with William Stuart's Big Ears speech recognition device.
The Chatterbox behaves as an output device — power is supplied from on-board your computer and the unit uses no memory, apart from that required by the software you will need to write. Comprehensive interfacing information is given in the documentation.
On power-up, the unit is silent. You need to open your computer's output channel and send the appropriate data — all the while checking the status of the sound chip (an SC-01 equivalent) to see if it is ready for data transfer. The procedures for doing this are all fairly straightforward and suitable short routines are given in the documentation. The routines need to be programmed each time you want to produce speech. They should therefore be saved on tape with suitable line numbers given to the necessary subroutines.
The sound of the device is basic — there are no facilities, either in hardware or software, to handle volume, filtering, pitch or rate. Words are best entered as strings of two-digit allophone codes. For example, "55084647" would be read, phonetically, as UX, NG, KK3 and EL — which sounds 'uncle'. Emphasis and accent can be synthesized to a degree by repeating allophones or using 'unconventional' phonetic spellings. Try KK3-LL-AE-SS, KK3-LL-AR-SS and KK1-LL-AR-AR-SS-SS for the word 'class', for example. The documentation could deal with phonetic spellings at greater length but, as with all allophone units, experimentation and some thought (how's that spelt phonetically?) will teach you all you need to know to produce understandable speech with character.
This unit is produced specifically for the Sinclair Spectrum and the documentation provides no technical information for interfacing to other computers. The case is neatly and compactly moulded out of black plastic and there is no apparent means of opening it. It slots into the Spectrum's expansion socket or interface unit. Its only outputs go to your TV aerial socket and, if required, via a 3.5mm plug to an amplifier or tape recorder. In normal use, the unit is plugged into the Spectrum's aerial socket and the MIC socket — this enables sound from the unit and from the Spectrum itself to be mixed and played through your television audio.
The Microspeech uses a General Instruments SP0256-AL2 chip and uses an allophone set almost identical to the other two devices reviewed. Its tone, however, has noticeably more of a whine to it than the others'. This may be a result of using TV output, or it may be due to the unit's filtering or its software. In any case — and like the other two systems — its voice sounds a bit like a retarded adult with a bad cold. (Incidentally, all the devices have a 'natural' male voice).
Once plugged-in and powered-up, all you have to do is press the ENTER key on the Spectrum's keyboard and the word 'enter' will be heard from the television. All printable characters on the Spectrum's keyboard can be spoken and will repeat, being cut short if the repeat rate is too high (which can lead to some interesting effects). Programming the unit for speech involves entering strings into a specific variable — s$. For example, 10 LET s$ = "he(ll)(oo)" followed by RUN will say 'hello' — as will 10 LET s$ = "hullo". Words and phrases — with the appropriate 'phoneticised ' spelling — can be manipulated using any strings as long as each time they are to be voiced they are put into s$ and the instruction is followed by a PAUSE command — to enable the unit to detect each new s$. For example,
10 LET a$= "hullo"
20 LET s$ = "hullo":PAUSE 1
30 LET s$ = a$ + ",(th)(thKaer)"
will produce 'hello hello-(pause)-there". Stress and intonation can be produced by the use of capital letters and repetition within a string.
The Microspeech software sets up a 256-byte speech buffer in the Spectrum's memory — enough to contain about 60 words. The buffer can be accessed directly by POKE statements and machine code subroutines. Its size can be changed and allophones can be entered numerically. Allophones are stored until called by the device — which enables you to enter phrases using, for example, an INKEY$ instruction so that they will be spoken in sequence after you've stopped pressing keys. The buffer must be constantly updated for proper speech. This does not happen during a BEEP statement. As soon as the BEEP begins execution, the allophone currently being output continues sounding until the BEEP terminates. By drawing out allophones in this way, speech can be combined with the Spectrum's primitive music to give a crude form of 'singing'.
As one might expect from a device seven or eight times more expensive than the others under review, the PSS has enough facilities to fill a book. Indeed, its documentation does fill two small books: an allophone dictionary (giving American phonetic spellings of English words — the PSS is American) and a manual giving comprehensive technical and programming data.
The PSS is fully programmable and includes its own Z-80 processor, a standard three channel music and noise generator, an SC-01 sound chip, a variety of switch and software selectable outputs, a 3½ Kbyte software manageable input buffer, an amplifier with volume control and a built-in loudspeaker and extension socket. It is sturdily and attractively built from die-cast metal.
The device is basically a sophisticated text-to-speech processor which looks like a printer (RS232C or Centronics parallel) to your computer. Once the printer output is activated, any words or numbers and most symbols you input are read, translated into appropriate phonetic form and spoken. Exceptions to common pronunciation rules can be programmed by the user. Phonetic spellings of your own devising can be used (THORT for 'thought', for example) and the device allows machine-code programming of the on-board Z-80 controller and direct allophone entry by using, not decimal numbers, but their equivalent ASCII symbols.
Words and phrases can be entered direct from the keyboard and whether they are entered that way or from a program statement the resulting voice can be changed by the use of control strings governing filter settings, volume, rate, resonance, inflection, music and noise settings, jumps to machine-code and music and speech mixes. The music and speech mixing facility is one of the most interesting features of the PSS. The output from the music channels can be used to control the speech output in the manner of a vocoder — the result, which will never replace Frank Sinatra (or even Boy George), gives a passable impression of a robot singing. There is scope for many more interesting effects, some of which actually result in comprehensible speech, but the PSS is by no means a simple piece of equipment to master.
The British suppliers — Cyber Robotics of Cambridge — seem to have some problems with connecting cables for the RS232C socket — especially those meant for the BBC computer. As a result, I spent a long time getting nothing out of the PSS but the sound of a deranged robot reciting limmericks under water. If you buy a PSS — and you may think twice at its current price — make sure the cable is properly wired for your computer — oh, and turn off the power-up message after you're sure the machine is working.
William Stuart Systems, (Contact Details)
Currah Computers, (Contact Details)
Cyber Robotics, (Contact Details)
Feature by Gary Herman
Previous article in this issue:
mu:zines is the result of thousands of hours of effort, and will require many thousands more going forward to reach our goals of getting all this content online.
If you value this resource, you can support this project - it really helps!