Every Little Bit
Sound quality has long been a grey area in the world of sampling. Chris Meyer finds out it takes more than "bits" to make a sampler sound good.
How many bits make a sampler sound good? Unfortunately it's not quite that simple, as sampling quality is dependent on a host of other factors - not the least of which is yer ears.
IT WOULD BE an understatement to say that the past couple of years have seen musical instruments undergo many changes. Only the far-sighted, the optimistic and the naive would readily recognise today's musical marketplace.
Most advances have been the result of cheap digital technology - sampling instruments being perhaps the best case in point. With the appearance of eight-bit samplers, a sound generation technique became available to the average musician which was previously only available to users of advanced music computers.
Then there was the 12-bit sampler. More recently, the '87 Winter NAMM show brought us affordable 16-bit sampling in the Casio FZ1. This has since been followed by the more expensive but equally exciting Emulator III and Prophet 3000 at the Summer NAMM. No doubt, a similar tidal wave of new instruments will arrive in the near future.
Another understatement would be that this proliferation of sampling instruments has left the potential purchaser with a lot of machines to look at and listen to. And listen, indeed - for not only does the normal synthesiser criterion of "new sounds" come into play, but also the new issue of "sound quality".
Unfortunately, the cold yes/no logic of digital electronics doesn't readily translate into sound quality. As you may have noticed, similar methods of encoding (storing sounds digitally in memory), sampling rates (how fast we take a "photograph" of the sound), and A/D resolution (analogue-to-digital accuracy - akin to the smallest object we can make out in our "photograph") don't necessarily add up to similar sound characteristics. This alone has left us scratching our heads on a few occasions.
So why doesn't an S900 sound like a 2002? They both have identical encoding schemes (linear), A/D resolutions (12-bit), and similar sampling rates. What do E-Mu Systems actually mean when they say their Emax, with eight-bit data, is equivalent to 12-bit linear machines? And why do some people (mostly other manufacturers) moan about the Casio FZ1 not being a "true" 16-bit machine? It certainly uses 16-bit electronics.
A little research and a few test sessions uncovered clues which we'll try to piece together to solve the mystery. We'll also try to explain the magic that manufacturers resort to in their quest to give us high quality at a low price.
LETS START WITH an easy one - why samplers with different resolutions (loosely translated to numbers of bits) sound different. Linear encoding is not only the easiest to explain, it's the most common - where the number of bits directly translates into the resolution of the sample. The whole purpose of digital recording is to make a smooth, "round-peg" analogue signal fit into a notched, "square" digital hole.
For the technically minded, the number of bits we have to play with translates into how many discrete "notches" we have to fit the sample into. The number of notches (signal levels) may be calculated by raising 2 to the power of "x" where "x" is the number of bits. Another way to translate bits into a more meaningful value is multiplying them 6dB - this is the theoretical dynamic range (softest to loudest) of the sound we're trying to record.
We'll start by looking at the range of an eight-bit analogue-to-digital converter (ADC), which is from 0-255 levels, and a dynamic range of roughly 48dB. The normal analogue signal output by the A/D converter ranges typically from -10 to +10 volts, or 20 volts overall. In this case, each bit output from the ADC will represent (20V/255 levels) 78.4 millivolts.
This resolution presents a couple of problems. For example, what if the input signal to the ADC falls at 5.70 volts? The digital representation output from an eight-bit ADC would fall somewhere between 200 and 201. So, if we represent the 5.70 volt signal with the digital number 200, or 5.68 volts, an error - distortion of the signal - occurs. This error is called quantisation error. The maximum error is obviously half the resolution between levels (in the worst case of a signal falling precisely between them), and the average error is about one quarter of this resolution. The audible effect is referred to as quantisation noise. This error makes a sound audibly less smooth, with the quantisation noise sounding like a cross between ordinary noise and a balloon being squeaked.
This noise gets worse as the signal level goes down. In our eight-bit case, if the input signal was down 42dB from its loudest output (a common occurrence at the tail end of sounds, such as percussion), we have only one bit left to represent the signal. Here the error is practically as large as the signal itself - with a subsequent drastic rise in distortion and quantisation noise. This can be heard as the squeaky balloon effect at the end of tom tom samples, for example.
As you can well guess, quantisation error decreases significantly as the number of bits in A/D conversion increases. For a 16-bit machine in our theoretical case, the quantisation error is down to 0.102 millivolts, which is below the capabilities of most humans' perception. As far as dynamic range goes, the laboratory measured response of undamaged human ears (which doesn't apply to most of us) to music is around 130dB. Practically, it's around 90dB for most people. Using the formula for finding dynamic range we get 16X6 = 96dB. We can see that 16 bits more than adequately takes what most music can dish out.
IT IS A common misconception of our western minds that dragons, magic and other sorts of compromise are ultimately bad. This is not strictly true and we won't write off lower resolution machines at this point. One reason is that the cost of producing a "true" 16-bit machine may prevent it becoming a practical consideration. In designing any given instrument, keyboard manufacturers, in addition to dealing with the overall hassle of just running a business, must deal with the issues of user friendliness, manufacturing, marketing and of course, the bottom line - cost, probably the most important to many potential buyers. So, while we know that a 16-bit sampler should satisfy us musically, the cost of building such a machine may be prohibitive, depending on the area of the intended buyer - amateur, semi-pro, or professional.
In order to meet the needs and demands of the user, a company might decide that a machine doesn't necessarily have to handle the 96dB dynamic peaks that a 16-bit sampler can. When the signal-to-noise ratio, or dynamic range, is measured in the presence of the audio signal, some claim that about 60dB (10 bits) of A/D resolution are all that are really needed to keep most listeners happy. There exist several other methods of sample data storage and encoding that deliver at least this much range at significantly lower cost. Therefore, some manufacturers resort to these various forms of magic to get adequate sound quality at lower cost. This usually means less memory (RAM) in the machine. Some of these magics are companding, floating point, and delta code modulation:
Next to linear, this is the most common form of encoding. For example, virtually every digital drum machine announced in the past year uses this technique, as does the sound chip in the new Macintosh II computer. This method uses fewer bits (usually 8) stretched over a wider dynamic range by placing more space between the highest levels. To do this, the signal must be compressed upon sampling at the input into an eight-bit (or 48dB) dynamic range. Upon playback, the output electronics have to re-expand this eight-bit signal into something more - typically 72dB (or 12 bits worth). This compression/expansion process is where the term companding originates. Expansion of the signal is either done in the analogue domain (by a circuit similar to the one you'd use to compress a guitar) or by a special DAC (digital-to-analogue converter) known as a COMDAC, which makes this round-to-square remapping of bits to voltage.
Nothing comes free - there is still quantisation noise generated when the signal falls at a voltage level in between those the COMDAC represents. By the nature of the system, this error is larger at higher levels (since there is more space between them), but the sheer loudness of the signal tends to cover it. But there is less error at lower levels, which need the higher resolution. The system works best for sounds that are loud for a short section of their overall period, such as drums and percussion. Often, major equalisation is still necessary to cover the faults - this takes the form of high frequency boost at the input and corresponding cut at the output designed to cut the quantisation noise with the excess signal.
Another piece of white magic is known as floating point. In this case, most of the bits are used to describe the signal as if it were linear, and the remaining bits are used to scale the signal's loudness. Taking our eight-bit example above, imagine adding on three more bits (range of values 0-7), with 0 representing "off', 7 representing "+/—20 volts", and values in between having different ranges. In this way, one can fake higher resolution at lower amplitudes (you always have eight bits to describe the signal). Since the same number of bits have less range to cover, the quantisation error (and therefore quantisation noise) comes down at low levels. Yes, it takes more hardware and software (and therefore more chance of error) to pull this off, but you can get by with reasonable resolution with fewer bits.
Again for the technically minded, it works like this: a simple hardware translator prescales the signal level entering the ADC circuit and stores this input scaling in, say, two- or three-bit format. This gives a significant cut in RAM costs by representing a 90dB S/N ratio that normally needs 16 bits of linear encoding in 14 or 15 bits of memory. Oddly enough, the one commercially available machine to use this scheme - the Kurzweil 250 - actually uses 18 bits - 10 for the signal (there's our 60dB again), and 8 to scale it. The new Kurzweil 1000 series also uses a form of this, and with the current emphasis on higher sound quality, more manufacturers are likely to try this scheme.
Another commonly used method to cut ADC and memory costs is a sampling method called Delta Modulation. This is the storage in memory of the difference in level from sample to sample, as opposed to the absolute value of each one. Several older DDL's along with the E-Mu Systems Emax and Emulator II use this scheme. The Emax, for example, uses eight bits to represent the differences of roughly 12-bit resolution samples taken by the ADC. This not only decreases the amount of memory needed for a corresponding 12-bit linear system by two-thirds, but significantly decreases the cost of the A/D conversion circuit (it is more expensive to accurately translate from audio into digital than the other way around).
Again, there are drawbacks to this approach to sampling. One is the fact that when the input signal increases rapidly, the eight-bit representation often cannot follow it in a single sampling period, and must take several samples to "catch up". This causes an overload in the natural slope of the input signal, which is referred to as slew rate limiting . Another drawback is the error that occurs when a sample differs less than a full positive or negative step from the previous sample - the same as the linear quantisation problem.
However, these sources of error can be easily overcome. Increasing the sampling rate - doubling it doubles the slew rate of the machine so that it takes less time to catch up (after all, a signal can only change so quickly). Reducing the minimum step size in the system increases the number of bits and hence reduces quantisation noise.
Another reason for differences in the sound quality between various sampling machines is the rate at which the sound is actually sampled. Some years ago Nyquist declared that sampling at twice the bandwidth of a signal would permit the capture of all information necessary for recording and reproducing it. However, in the years since, it's transpired that more than twice the sampling rate, about two-and-a-half times, is actually necessary. So, for the 20kHz audio spectrum the sampling rate needed would be around 50kHz. Interestingly enough, almost all hi-fi applications use a sampling rate less than this theoretically necessary rate. CD's and PCM recorders both use 44.1kHz as their sampling rate (some CDs use only 14-bit A/D too). So, differences in sampling rates of machines with similar encoding systems and resolution can limit the amount of information they are able to extract from the input signal, thereby affecting the sound quality. Try sampling the same sound at two different rates - say, 31kHz and 42kHz - on the same machine, and see if people can really hear the difference.
THE INSIDE OF a digital sampler is no place to find pure analogue signals. There are a lot of strange radio and clock frequencies floating around trying to get at and spoil our virgin sound. A lot of attention goes into laying out the circuit boards and electrical shielding. This has a lot to do with how many unwanted noises and disturbances mingle with the sound between the input and output.
And then comes black magic. Digital doobries don't come cheap, friends, so some manufacturers play tricks in hardware to cut them down. The most common of these is multiplexing. This means time-sharing the resources of a machine. The most common case of multiplexing is RAM - instead of having eight copies of a sound to play back eight voices, the voices use the same copy of the sound for a brief instant of time, and make do with this slice until they get another one. Some retain this information in digital form, and then feed it to a DAC. This analogue signal must be remembered by the DAC with a sample and hold (S/H) circuit. The quality of these S/H circuits, and how fast they are updated, determines the quality of the output signal. Some instruments even multiplex the DAC's but this requires very fast S/H circuitry.
It is because of these tricks that a machine with 16-bit ultralinear electronics may end up sounding little better than a 12-bit machine - let alone a CD. But it helps keep the cost down.
So, why do CD players and R-DAT machines cost less and sound better than samplers? Well, the parts used in a CD player cost less because their development can be shared over tens of thousands of units. Subtly different parts are used in sampling instruments, with only thousands or even hundreds of units being sold to recoup the R&D cost. Eventually, when there are tens of thousands of samplers built, these parts will have been paid for, and samplers should cost less. Admittedly, this is grossly oversimplifying matters, but it gives you some idea of the situation.
SO FAR, WE'VE eliminated the differences of A/D resolution and sampling rate (and at least explained different encoding schemes) in an attempt to explain why samplers don't all sound the same. So, let's use the same A/D resolution, the same sampling rate and try not to compare apples to oranges by comparing a LinnDrum to an Emulator II. Result? Our samplers still sound different. Why, why, why?
The final difference comes down to our old friends noise, frequency response and distortion. Some of these demons have familiar manifestations; others are new.
As you may or may not know, low pass Filters are used in most samplers between the input and the ADC in order to prevent aliasing. This occurs when frequencies are present in the input signal which are greater than half the sampling rate, and are mistakenly characterised at a lower frequency. Imagine that pictures are being flashed in front of you, but this time you are also opening and closing your eyes at one second intervals. If your eyes are open for a half second, and closed for a half second, and the picture is changed every half second (once while your eyes are opened, and once while they are shut), you will see only one picture every second. It gets more complicated if the rate of change of the pictures is not sync'd with your own "sample" rate. This is similar to what happens to an ADC when those high frequencies are input and are digitised at a lower frequency. This colours the sample taken.
Enter low-pass filters. These are placed before the ADC to filter out frequencies higher than we can sample. However, because the filters are built of "analogue" components, there's bound to be noise, frequency response changes, and distortion involved. The seriousness of this depends on the quality of components used, which is reflected in the cost. Also, there may be a certain amount of equalisation added at this input stage, to cover the deficiencies of the system or to brighten the signal.
The noise present in this filter circuit is what governs the lowest noise ("noise floor") of the instrument. Even if there is no signal present this is as quiet as your sampler is ever going to be. Therefore it has a significant effect on the resulting signal/noise ratio, and from that dynamic range.
This noise floor can be measured by how many bits on the ADC "toggle" with no input signal present. As more bits of noise appear, a greater noise characteristic is recorded along with the desired signal, lowering the sound quality. In other words, the more bits of noise present, the fewer bits are available to record the input signal. And that's one figure we haven't seen listed on any spec sheet.
The input is not the only place you'll find a filter. "Reconstruction filters" smooth our digital signal back into analogue, and other filters are sometimes used for timbral changes - you know, that voltage-controlled job that you used to find on real synthesisers. These too have different frequency response, distortion, and noise characteristics. Their shortcomings are not as noticeable on synthesisers because the sound source - the oscillator - was always running at full volume, masking noise. However, with a sample, changes and colorations to the original are much more noticeable.
So, we're back to the same differences that we had back in the days of analogue Moogs, ARPs, Korgs and Rolands - the differences in components and circuits used in sound generating.
For a while, it looked as if the manufacturers should print frequency, distortion, and signal-to-noise specs along with their samplers, and those would tell us which sampler sounded better. This still isn't a bad idea - it would at least give us an indication which sampler was more "accurate". But better? Well, which guitar amp, microphone, or synthesiser sounds better? It's time to go trust your ears again: get down to your local music store with a handful of sound sources that you think you'll be using (or CD recordings of them - it's much easier to carry around than a grand piano), sample them into different instruments and listen.
The author would like to thank Scott Peer for his assistance in compiling the technical data for this feature.
Feature by Chris Meyer
mu:zines is the result of thousands of hours of effort, and will require many thousands more going forward to reach our goals of getting all this content online.
If you value this resource, you can support this project - it really helps!