We've been sold and resold the idea that digital is better than analogue, but what is digital sound and why is it revolutionising music technology? Peter Bergren counts the bits.
You can't buy a piece of musical equipment these days without it claiming to be digital in some respect, but what does "digital" actually mean when it comes to sound?
IN A WAY, you can never really "hear" anything because hearing is really a personal perception, rather than a direct experience. If the classic tree in the forest falls, there's a physical disturbance causing the air molecules to move, thus creating a sound wave. When we're in the forest and hear the tree, we're still several steps removed from having direct contact with the motions, actions and reactions of its fall. What we're really experiencing, is soundwaves against our eardrums.
In each stage of the hearing process, energy in a particular medium is translated into another form of energy in another medium. What is common to all these stages is the replication of energy changes (amplitude, polarity, time factors and so on). Unfortunately for the energy, the media themselves are in a constant state of flux (even if viewed over very small periods of time), and therefore have continuously changing characteristics. It follows that each energy transformation causes inaccuracies in translation, so that some of the original information will be lost, or spurious information added.
So, to take a stock definition of analogue: "something similar to something else". Hearing something from analogue tape is actually to hear an analogy of what really happened. To be fair, it's a close replication, offering a similarity to the real (I use the word loosely) thing, but there are going to be differences.
IF SOUND IS an energy event that varies continuously with the passage of time, what's the most logical way to transmit it? Well, by some means that can be made to vary in a way that's similar to the soundwave. Alexander Graham Bell did this by varying the spacing (and thereby resistance) of carbon granules, thus modulating a current in sympathy with air pressure changes. Edison made a permanent record of these changes by using sound pressure to cut a groove in a wax cylinder that was similar in contour to the fluctuation in air pressure occurring as he said "Mary had a little lamb".
But there are a host of problems involved in this kind of approach, many of which result in noise and distortion. Unfortunately, most noise sources (such as dust in a record groove or rumble from a turntable motor) have either a physical contour that varies continuously (such as the scratch in a record) or an electrical characteristic that does the same (such as white noise or radio interference). Systems sensitive to reception of the variations in sound waves are also receptive to the characteristics of noise sources. That's why you have to take care in handling your LPs, and also why companies make so much money on devices that can lower the noise level and increase the signal level in transmission, processing and recording systems.
But there's a limit to how high the signal level can be made. There's no way you can cut a groove in a record that's comparable to the amount of energy emanating from the falling tree. The physical limits of the media prohibit such high energy levels, just as low level noise sources compete with signals at the other end of the loudness scale. And when you consider that these problems accumulate with each copy you make, analogue methods don't seem such a good bet after all.
INSTEAD OF RECORDING the continuous variations of a sound wave as changes in another medium, suppose we make a series of measurements of them at precise points in time? Then we can retrieve this information without analogue noise and distortion being recognised. This is precisely what digital systems do. They sample the analogue waveform many times a second and quantise each measurement made during a sample period, so that a stream of discrete, unambiguous numbers is the result - numbers which represent changes in the sound wave's amplitude and polarity over time.
Because each sample is a particular number, these numbers can be encoded in binary form - as 1s and 0s, or on and off pulses clocked at a consistently accurate rate. This scheme is ideally suited to digital transmission, storage, and manipulation. Analogue noise ends up being too low in level to register as 1s and 0s on its own (unless it's present at the input before the signal is digitised), and representing very loud sounds is simply a matter of generating larger numbers.
As the sound is now a stream of numbers, altering a number modifies the sound. Increasing a number - making the sound louder - is a matter of multiplication; decreasing one - for a lower sound level - is a matter of division. And as computers are great at maths, performing operations on a signal in the digital domain introduces less distortion and noise than dealing with it in the analogue domain. Mathematical operations can be performed to produce the equivalent of filtering, EQ, mixing, reverb, pitch shifting, delay and their like - almost any analog process can be analysed and an algorithm (a program) devised which can then control the electronic hardware. Neeto, eh?
IMAGINE A PIECE of graph paper divided into 100 segments vertically, and 100 horizontally, with a horizontal line drawn across halfway down. The vertical scale represents 100 voltage divisions: 50 positive and 50 negative, with the halfway line being the zero voltage point. The complete scale represents one volt. The horizontal scale represents 1/100th of a second, divided into 100 parts.
Now imagine one cycle of a 100Hz sine wave drawn on the graph paper, with its zero crossing exactly midway on the horizontal midline of the graph. (See Figure 1) Let's describe this waveform in numerical terms: base 10, decimal notation.
To do this you have to relate time to voltage, and come up with a "data stream." It's fairly easy, if rather tedious, to do this. Simply correlate each time value on the horizontal axis with its corresponding voltage value on the vertical axis, being careful to note the polarity of the voltage read-off. Read upwards from a time value to its intersection with the waveform, then across to the voltage level that comes the closest to also intersecting at the same point on the waveform. Note that time and voltage intersections don't always coincide exactly on the waveform line. Approximating this is called quantisation error.
When you're done you should have two columns of 100 numbers each, both starting with zero. The decimal representations of voltages can be relatively easily changed to binary notation, consisting of strings of 1s and 0s. Once in that form, they can be easily stored as pulses/no pulses on tape or disk, or conveyed by a wire, fibre optic cable or whatever.
Modern digital systems will sample at 44.1kHz, or 48kHz, and in a 16-bit linear format offer 65,536 measuring, or quantisation, increments. This large number is derived from allowing 16 bits to form a byte, or word, describing a particular voltage measurement. Since each bit can have two states (on or off), and there are 16 of them, you have 2 to the 16th power, or 65,536. An 8-bit word by comparison is 2 to the 8th power, or 256 measuring increments. This disparity accounts for the considerable discrepancy in dynamic range between 16- and 8-bit systems.
Rapidly varying (high) frequencies and transients require more sampling intervals than low frequencies, and defining very loud against very soft sounds demands a wider range of numbers. Also, the more measuring increments you have available, the less likely it is that a small change in the analogue waveform will occur between two measurements, assuming also that the sampling periods coincide so as to catch the change. When a small change like this does occur, the bipolar nature of change in the analogue waveform is not represented by a change in the voltage levels recorded by the digital system, and distortion results, which will be reproduced unless certain measures are taken. More of these later.
It's important to realise that any analogue-to-digital (A/D) conversion is always subject to quantisation error, because you're trying to represent a continuously varying process with finite numbers. So how important is this error, especially at high frequencies where, for example, you would have roughly two samples (44.1kHz rate) taken within one cycle of a 20kHz waveform?
TAKING THE 100Hz wave in figure 1 as an example, let's imagine what happens when it enters the sample and hold circuit of an A/D converter. This circuit is analogue, in that it creates a hybrid analogue waveform that consists of discrete voltage steps that rise and fall with the contour of the sampled waveform - like stairs going up and down a hill. The ratio of stair "riser" to "tread" measurements varies with how long a given voltage value remains constant for a given number of sampling periods. The stairs could be considered to be fairly symmetrical for a pure sine wave.
The sample/hold circuit forms this staircase by capturing the voltage in the waveform present at the beginning of each sample period and then holding this value until the next sample period. This hold function creates the "tread" in the stair step. At the next sample period, the value of the next held voltage changes, because the analogue waveform has itself changed value, continuously, between sample intervals. This change in voltage level to that captured and held during the last sample period creates the "riser" in the stair step. (See Figure 2)
The important point here is that the stair-step voltages more-or-less follow the contours of the waveform. More-or-less because quantisation errors are always present when the actual value of the waveform doesn't exactly coincide with a contour they were rising and falling with.
In the case of a 20kHz sine wave, as long as it rises and falls within at least two sample periods and crosses at least two quantisation intervals, a square wave representation will be formed by the sample/hold circuit. By definition, this square wave will contain odd order harmonics higher in frequency than the sine wave it represents; but when it runs into the filter, the square wave will lose all harmonics that are odd order (180kHz, 9th harmonic; 140kHz, 7th harmonic and so on) and become the 20kHz sine wave it was meant to be.
If this sine wave is more complex and contains a harmonic ripple, this harmonic would be far too high to hear - and would be filtered out. It should be noted here that this filtering takes place at the system output, after the waveform time/voltage measurements stored in binary form have been reconstructed into a stair-step waveform. This filter is often called an "anti-imaging" filter, for it also removes high frequency multiples of the input waveform's frequency response curve (spectrum).
The more quantisation increments you have available, and the more samples taken per second, the more likely it is that the analogue waveform will fall on the intersection of a sample period and a quantisation (measuring) increment at the start of each sample period. Nevertheless, the stair-step waveform can be decoded into an accurate recreation of the original waveform by filtering.
The stair treads really are a collection of square waves (or in some cases rectangular waves). As you may or may not know, square waves are formed when all the odd harmonics are combined. This can be thought of as a collection of sine waves starting upward from a given fundamental frequency (another sine wave). In our 100Hz waveform with an amplitude of one volt peak-to-peak, there would be 480 such square wave "treads" following the contour of each wavecycle, and 48,000 of them per second. This assumes we're now sampling at the 48kHz professional rate rather than the rate implied in our example with the graph paper. If we pass this stair-step wave through a filter with a stop band that cuts in at, say, 22kHz, all the stair step components will be removed. What will be left will be the 100Hz sine wave whose quantisation increment. Manufacturers have decided that a 44.1kHz sample rate and a 16-bit quantisation word are sufficient (for now - progress marches ever onward) to accurately encode and decode signals in the normal audio range. Large amplitude and complex signals relate randomly to any errors that are generated, and the errors, when reproduced at a digital system's analogue output, are low in level, resembling white noise. When the signal level being quantised/sampled drops, however, the errors become closer to these smaller excursions of the signal, for there are times when the signal's change in amplitude does not cross more than one quantisation increment.
For reasons too complex to describe here, this produces a nasty form of distortion known as "granulation noise". One way around this is to introduce a small amount of white noise to the incoming signal, called "dither". This is superimposed on the waveform, and effectively causes low level changes to cross two quantisation increments instead of one. When the resulting data is recreated and filtered, the result is the original waveform sons distortion, but with a negligible amount of white noise added. This situation often occurs when the minute harmonic ripples riding on the larger excursions of a complex wave, don't quite cross from one quantisation increment to another between sample periods.
A FELLOW NAMED Nyquist came up with a very interesting theory, which he proved experimentally, and which has since come to be the basis of the design of all digital systems. Briefly, he said that the highest frequency for a digital sampling system must be at least half the sampling frequency. So if you want to record a 20kHz signal, your sample rate must be at least 40kHz. If you try to record any frequency higher than the 20kHz, the analogue-to-digital converter (ADC) will not be able to take two samples per cycle; meaning it won't be able to accurately show the bipolar nature of the waveform and its frequency.
Remember that two samples must be taken and two quantisations crossed to create a square wave (via sample/hold) of the same frequency as the wave sampled. Beyond this Nyquist Limit, the sampler will take samples of the rapidly varying amplitude of, say, a 30kHz waveform, but catch samples at points not really related to each cycle of this waveform. The result will be descending frequencies ("aliasing" frequencies, since the original is being represented as something else) which take the place of the original harmonics. (See Figures 3A and 3B)
For example, if S equals the sample rate of 48kHz, and F equals the 30kHz "defiant" harmonic you wish to capture, then S-F equals the spurious harmonic added to the legitimate frequencies below the Nyquist Limit. In this case, you would suddenly find yourself with an 18kHz signal present at the input. This and other frequencies formed by similar interactions cause aliasing distortion in the audible signal. To avoid this, a filter with a flat bandpass and very sharp stop band is inserted ahead of all other elements in the digitising system. In some ways it's similar to the anti-imaging filter at the tail end of the chain, and it's usually called an antialiasing filter.
SO MUCH FOR a few of the, uh, "fundamentals". Many of the terms here may be familiar to you from sampler or CD player specs. Perhaps they'll mean a bit more next time you meet them. If you're thirsty for more information try Principles of Digital Audio by Ken C Pohlman Howard (published by W Sams & Co) - the plot's a bit weak but the attention to detail is stunning.
It's a whole new digital world out there.
Feature by Peter Bergren
mu:zines is the result of thousands of hours of effort, and will require many thousands more going forward to reach our goals of getting all this content online.
If you value this resource, you can support this project - it really helps!