Bits 'n' Pieces (Part 1)
An Introduction to Digital Audio
Digital audio explained from the ground up — bit by bit!
To those of us brought up with analogue tape recording technology, it's difficult to imagine that it could ever completely disappear — but it almost certainly will, and its place will be taken by ever cheaper and more accessible digital techniques. Yasmin Hashmi introduces this important subject.
In the not too distant past, the transistor was responsible for a revolution in communications. One of the offshoots of transistor technology, the personal computer, has had just as great an effect on modern media editing. Take publishing, for example. The word processor (a computer with a bent for text editing), has been responsible for completely redefining the market. Publications of high quality can be designed, edited and printed from the desktop — in a fraction of the time it would take to complete the process conventionally. In addition, the non-destructive editing computers provide means freedom to try out any number of designs and arrangements without incurring high costs in terms of time and materials. Who could imagine giving up their word processor and going back to the manual typewriter? The personal computer has had the same effect for audio — allowing the musician to record and edit audio and arrange it in a number of different ways.
The first musical task the personal computer (from now on called PC) tackled was sequencing. By adding the appropriate software, the PC could be turned into a compositional tool, allowing the musician to notate a score using non-destructive word-processing style operation. By adding a MIDI interface to the PC and connecting it to a MIDI-compatible sampler or synthesizer, the score could automatically trigger the selected sound in the sampler/synth, telling it when to play a note (or notes if polyphonic), at what pitch and level and for how long.
With a library of sounds and a sufficient number of samplers and/or synths, the musician could now create complete backing tracks at leisure. In addition, the PC could be synchronised with a tape machine which would be used for recording live performances. Furthermore, the number of tape tracks normally required could be greatly reduced. The overall cost savings, flexibility and freedom provided by the PC, MIDI, sampler, and synth combination led to a marked increase in the number of musicians creating arrangements at home — with a commensurate decrease in business for multitrack recording studios.
Because the computer provides editing features which are difficult, if not impossible, to achieve with tape, its capabilities have inevitably been applied to the live recording and editing process itself — in many cases replacing tape altogether, whilst in others complementing it. This series of articles will explain the basic principles behind tapeless recording and editing and how the technology is applied.
All tapeless recording and editing systems require audio to be digitised, and there are a number of advantages to using digital audio. However, in nature, all that we see and hear is analogue. That is, a sound or image consists of a signal which is continuous for the entire duration we hear or see it. An analogue signal is shown in Figure 1.
It is, however, possible to fool the eye or the ear into thinking that something is continuous when in fact it isn't. A prime example is film, where a moving image actually consists of discrete pictures (or snapshots) which have been taken of an analogue image. The number of pictures shot per second must be more than the eye can detect, so that when the snapshots are run past the eye, the brain does not have enough time to distinguish between one snapshot and the next and so thinks that the image is continuous (or uninterrupted). The same can apply to sound. If sufficient snapshots are taken of a sound, they can be run past the ear at a rate which makes the sound seem continuous.
Figure 2a shows an expanded view of part of an analogue signal (or waveform). Snapshots of sound are taken by sampling the analogue waveform at regular intervals, as shown in Figure 2b. The voltage level of the waveform when the sample is taken is converted into a number and stored on a suitable medium. The device which carries out the sampling process of converting voltage levels into numbers is called an analogue to digital converter (or A/D — pronounced A to D).
Once the audio has been digitised, the stored numbers (or samples) can be manipulated by computer, edited (if necessary) and replayed. However, before they are replayed, the numbers must first be converted back into voltages in order to produce a waveform which we can hear. The device which performs this conversion is called a digital to analogue converter (D/A). Figure 2c shows the waveform created by converting the samples taken from the waveform in Figure 2b back into voltages. Depending on how many samples are taken, a good or bad approximation of the original waveform will result. Figure 2c shows that taking a sufficient number of samples results in a reasonable approximation of the original waveform. The waveform is further improved by smoothing it out with a filter after the D/A. Figure 2d shows the result of taking too few samples — the reconstructed waveform barely resembles the original.
Thus the more samples taken (i.e. the closer they are together), the better the approximation of the original waveform will be. For compact-disc quality audio, for example, 44,100 samples are taken for each second of audio, i.e. the sampling rate is 44.1kHz (kilo Herz).
The samples are not stored as ordinary decimal numbers, but rather as binary numbers — that is, numbers which are represented by 1s and 0s (for example the decimal number 141 would be represented by the binary number 10001101). The beauty of 1s and 0s is that they are very easy to represent and recognise. Taking a light bulb as an example, a '1' can be represented by 'on' and a '0' by 'off'. Time and effort does not have to be wasted in determining how bright the light is, we're only interested in whether it is on or off. If we had a row of eight lightbulbs, we could easily and unmistakably display the number 141 to someone else who can read binary by switching the appropriate lightbulbs on according to 10001101. In fact, with eight lightbulbs (or eight bits), we could represent a total of 256 different numbers.
In digital electronics, a '1' is represented by a voltage (or simply anything above a certain voltage) and a '0' represented by no voltage (or anything below a certain voltage) as shown in Figure 3.
Much of the circuitry inside a computer consists of chips (integrated circuits or semi-conductors), an example of which is given in Figure 4, and 1s and 0s are extremely easy for such devices to recognise and deal with. Inside the chip are arrays of microscopic transistors which are arranged in a particular way so as to give a chip a particular function. The transistors can be likened to lightbulbs, in that they can be switched on or off, but rather than being switched on or off by hand, they are switched by applying 1s or 0s to the metal pins (or legs) along the chip's sides.
Some chips are designed simply for storing binary numbers; others are designed for performing calculations on the numbers — accepting two different sets of numbers and comparing them, multiplying the numbers by other numbers, adding numbers together, and so on. An advantage of binary numbers is that computers use them. A computer makes no distinction between binary numbers which represent audio, video, text or anything else — the number 10001101 could, for example, uniquely represent the letter 'Q', or even a sample of air pressure. The only way in which this information is distinguishable is in how it is presented to the outside world. Digitised audio therefore readily lends itself to computer control and this opens up a new dimension in editing and processing — in the same way as the word processor has done for text.
Another advantage of digital audio is that it cannot easily be degraded. When an analogue signal is recorded and mixed, it can be affected by other (usually smaller) signals generated in and around the circuitry through which the audio must pass. In other words, analogue recordings are susceptible to unwanted noise, and copying or bouncing tracks down, for example, adds successively more noise (or hiss). In addition to hiss, analogue recording to tape can suffer from dropout, wear and track bleeding. Domestic media, such as cassette or vinyl record, introduce further limitations. The master may be relatively noise free, but the disc-cutting process is regarded as fairly crude and can introduce noise and bandwidth restrictions. Cassette tape also suffers from bandwidth restrictions, as well as tape dropout. In addition, both media are susceptible to wear, which can also cause unwanted noise.
With an analogue signal, noise actually affects the waveform that you hear, as shown in Figure 6. But with digital, noise signals can be likened to a finger tapping on a lightswitch, but with insufficient force to throw it from on to off — noise is unlikely to turn a 1 into a 0 (or vice versa). Successive circuitry does not care about how clean the 1s or 0s are, only whether they are above a certain voltage or not, so that they will switch transistors on or off — at which stage, if there is no noise present, the 1s and 0s generated by the successive circuitry will be clean, as shown in Figure 7. This means that digital audio can be copied, an infinite number of times, without degradation; the copying circuitry will faithfully duplicate the 1s and 0s (and may even clean them up!).
In any case, the 1s and 0s are not the actual waveform that you hear, but will ultimately be converted back into an analogue waveform at the last stage (the output of the system). Therefore, in the case of compact disc, for example, depending on the quality of your amplifier and monitors, you will have the opportunity to hear the audio with the same quality as when it was mixed in the recording studio.
Digital audio can be stored on tape, hard disk, floppy disk, optical disk, memory chips or compact disc. However, in order to take advantage of the recording and editing capabilities a computer can offer, the storage medium must satisfy certain criteria. As we will see in Part 2, the key to computer-based editing is random access. This is the ability of the computer to access any part of the digitised recording almost instantly. It is therefore essential that the recording medium allows random (or instant) access. It is also essential that the recording medium has sufficient capacity to store a practical amount of information — in the case of audio, for compiling lengthy arrangements such as entire albums or the soundtrack for a film, this can amount to hours rather than seconds or minutes. The medium must also be affordable and, in this green and cost-conscious world, should be reusable (erasable).
Tape has sufficient storage capacity in terms of both tracks and time. It is also erasable and affordable. However, it is unsuitable for our purposes because it does not allow random access (because it takes too much time to spool back and forth). It is, therefore, best suited to mastering and/or recording which does not need much editing.
Compact disc is currently the only medium available on a large scale for domestic digital audio reproduction, but is not a suitable medium for recording/editing purposes since it is not erasable and its access, although much faster than tape, is not fast enough.
RAM (Random Access Memory) chips provide static storage (there is no physical movement involved), and so have the fastest possible access time. Although professional audio generally converts samples into 16-bit binary numbers, commercially available chips generally support 8-bit numbers (or bytes). A 16-bit sample will therefore be stored as two bytes. Figure 5 is a simplified representation of how RAM works. Inside the chip, bytes are stored in horizontal rows. The storage capacity of the chip depends on the number of rows provided — if there are 1024 rows, for example, the chip will have a storage capacity of 1Kbyte. For compact disc quality, this amounts to just over a tenth of a second (since for one second, CD requires 44,100 16-bit samples = 44,100 x 2 bytes = 88,200 bytes. Therefore 1Kbyte provides 1024/88,200 seconds = 0.012 seconds).
The pins on the left side of the chip consist of address lines, and those on the right serve as both inputs and outputs. Each row in the chip has a unique address (in the form of a binary number), and a row can be selected by setting up its address on the address lines on the left. There is also a pin for read/write (not shown) and if a 1 is applied to this pin, the pins on the right become outputs and the contents of the selected row will be output to the pins. If a 0 is applied to the read/write pin, the pins on the right become inputs and whatever is applied to them will be transferred to the selected row. Since there are no physically moving parts, the time it takes to change addresses, input or output information, change from read to write, and so on, is so short as to be virtually instantaneous.
RAM therefore satisfies the need for instant access, but the amount of storage space provided by a chip is very small. This can be increased by using multiple chips, but RAM is relatively expensive and takes up space. This means that RAM is more suited for storing seconds/minutes of material rather than long recordings. In addition, RAM is volatile, which means that if the power source is removed, the contents of the RAM are lost (although cartridges are available which have long-lasting battery backup). Nonetheless, because of its instant access, RAM is highly suited to being a temporary work area, where audio can be loaded from another medium for processing of some kind and then either output or loaded back into the original medium. Don't confuse RAM with ROM (Read Only Memory); ROM is also a chip, but the information inside has been permanently 'blown' and can only be read. Once blown, a ROM chip cannot be erased or further recorded to.
The floppy disk is designed to be a cheap and convenient storage medium. It stores information magnetically and is inserted into a drive which can be likened to a vinyl record player that can record as well as play. A record player allows quick access to any part of a recording by lifting the head and placing it elsewhere, although this can be somewhat hit and miss. A fresh floppy disk must first be formatted (also using binary codes) by the computer, so that the record/play head can precisely find its way around the disk. The disk is divided into tracks (not one continuous track as with vinyl) and information is stored as blocks within a track. Preceding each block is an address which uniquely identifies that particular location on the disk. To find a section of audio, the head will move across the tracks with the disk rotating underneath. When it sees the address associated with that section of audio, it will read the block of information which follows.
The relatively cheap materials, and the way in which the drive operates, mean that the density of information stored on disk is not very high, providing seconds or minutes of storage rather than minutes or hours. The head touches the disk, which means that the track width cannot be very narrow, and if damage from overheating or dirt is to be avoided, the disk cannot rotate very fast. This means that there is no point in using sophisticated mechanics to quickly move the head across the tracks if it must wait a relatively long time for the correct address to pass underneath it. Thus a simple and cheap stepper motor will do — adding to the floppy's affordability. However, because of its slow access time and low density, the floppy is not suited for real time (live) recording or playback of full 16-bit, 44.1kHz audio. It is therefore more commonly used as a non real-time backup medium for short sounds, as in RAM-based samplers, for example.
This is more expensive than floppy disk but stores much more information and is more cost-effective than RAM. It uses different materials to floppy, but the general principles are similar. However, the head rests just above the surface of the disk, so there is no physical contact, which means that neither the disk nor the head are subject to wear. More importantly, this allows the hard disk to rotate much faster than floppy, and using superior mechanics for head control means that the time taken for the head to move from one position to another is extremely short. Its operation also means that the track width can be much narrower than floppy, allowing more tracks in total and increasing the density of information.
Because of the very high rotational speed of the disk, any debris or dirt particles caught between the head and the disk surface could cause severe damage. In order to avoid this, hard disks are sealed in the drive and are not removable. The entire drive itself can be removed, but this is a rather expensive and unsatisfactory solution, and hard disk is generally not considered a removable medium. This has proved to be one of its major drawbacks, since once the disk is full, the recorded material (if it is to be kept and further recordings are to be made to disk) must be transferred to another storage device. This takes time, which can become significant if a great deal of material has been recorded.
This type of disk may be described as a cross between floppy and hard disk. It is removable, and has a much higher capacity than floppy, but a much lower capacity than hard disk.
Optical disks have always been removable, but are now also erasable and have large recording capacities. However, the problem with optical has been its slow access time compared with hard disk, although the technology is ever-improving and optical is now being used as the primary recording medium for a number of tapeless editing systems. However, if the choice is between hard disk and optical, hard disk is still preferred by the majority of manufacturers, since it is still faster than optical — with optical being recommended as an archiving medium.
Next month, in part two of this introduction to digital audio, we'll look at the advantages of instant access and the principles behind non-destructive editing.
Feature by Yasmin Hashmi
mu:zines is the result of thousands of hours of effort, and will require many thousands more going forward to reach our goals of getting all this content online.
If you value this resource, you can support this project - it really helps!