Samplers generate their own specialised jargon — so here's a quick guide to help guide you through some commonly used terms.
Hi-tech music is plagued by jargon: Linear Arithmetic, FM, AI and Vector synthesis, signal-to-noise-ratios, 64-times oversampling... Samplers are among the pieces of equipment most likely to be talked about in jargon-laden language, so what follows is a brief explanation of a few of the more common terms.
The signal-to-noise ratio (SNR) is exactly what it says — the ratio between the loudest signal and the quiescent noise. In audible terms, the ratio between noise levels in a silent room and at a few feet from a jumbo jet at take off is about 120dB — the jet is about a million million times louder. This range between silence and pain is more or less fixed, the lower limit determined by how sensitive your ears are and the upper limit by actual physical damage to the inner workings of the ear.
In a similar way, the upper limit on the signal level from a sampler is the point at which it starts to run out of drive capability. Often this is the 'clipping' point, as the audio signal approaches the voltage rails of the output amplifier, or just the maximum output of the digital-to-analogue convertor. The lower limit is determined by the noise produced by the sampler's output. In practice, distortion introduced by the sampling process and the analogue stages means that the ratio that you can actually measure is more correctly termed a signal-to-noise and distortion ratio.
The link between the number of bits and the SNR is often simplified to this equation:
SNR = number of bits x 6dB
So a 16-bit machine should have an SNR of about (16x6) = 96dB (CD-quality), whilst 12 bits would give 72dB, and an 8-bit machine only 48dB (telephone/AM radio quality). This figure, which we have calculated with a simple equation, represents the best you can possibly achieve. Saying that a 16-bit sampler automatically has an SNR of 96dB is rather like saying that anyone can run 100 metres in 10 seconds. Just as world-class times like that are usually restricted to top athletes at the peak of their fitness, so it is that even a well-designed 16-bit sampler's SNR will probably get into the low 90s of dBs.
The distortion produced by a sampler can be measured by sampling a single frequency, then playing it back and removing the original frequency from the output. What you are left with is the distortion added by the sampling process, plus any noise from the output stage. This is called Total Harmonic Distortion (THD).
The best way to interpret THD is on a graph: typically it will show a low noise floor (either for the measuring instrument or the output stage) which should be below the expected SNR figure, so for 16-bit samplers you ought to see a noise floor well below 96dB down from the Maximum Output Level (MOL). This noise floor means that THD is more correctly known as THD+noise (THD+n), although the '+n' gets left out of many manufacturers' specifications. Above the noise floor on a THD graph will be spikes at frequencies which are often harmonically related to either the sampling rate or the signal being sampled or played back: for example, you will often find a distortion peak at exactly half the sampling frequency.
THD figures, however, can look good (typical THDs for samplers are less than 1%) whilst disguising problems because they describe a weighted average of peaks on the graph, ignoring relatively high peaks.
Linearity is the other side of the coin to THD. Broadly speaking, the more linear is the system, the better will be its THD. Linearity is often expressed by the number of bits that, in a perfect sampling system, would be needed to produce the same THD — hence you see '16-bit linear' or 'linear to 14 bits' used in advertisements and spec sheets. The word 'linearity' may often be preferred to 'distortion'!
The brightness of a sample is related to the highest frequency component that is present in the replayed version. Bandwidth is a way of describing the highest and lowest frequencies that the sampler can produce. Many samplers will reproduce very low frequencies, often down to DC, and so the highest frequency is often the more important figure. The highest frequency is like the top speed of a car — you might not drive at that speed all the time, but you might need it sometime. For a typical family car with a top speed in the mid-90s, you could say that its speed range was between 0 and 95 mph, which is much the same as saying that the bandwidth of a typical sampler might be from 20Hz to 20kHz.
Your hearing gets less acute with age. Teenagers can usually hear well past 17kHz, whilst your thirtysomething might no longer hear the 16kHz TV whistle. As you get older, cymbals get less splashy, and Dolby seems to get better and better at removing noise. 20kHz is probably above the hearing range of everyone except dogs anyway.
It is also important to understand that bandwidth is not an all or nothing measurement: it typically measures the frequencies at which the output is 3dB down from the MOL. Bandwidths measured in this way will often be shown described as "20Hz to 20kHz (-3dB,- 3dB)". This is not a fixed rule — a bandwidth specification of 20Hz-20kHz (-6dB,-15dB) means that at 20Hz the signal was quite attenuated, and at 20kHz it was significantly attenuated. In -3dB terms, such a sampler probably only has a bandwidth more like 50Hz to 16kHz. Take care when comparing.
Samplers work by taking a measurement of the incoming audio signal every so often and storing this value away in memory. Imagine a time lapse camera observing a door, taking pictures every five minutes. If a person goes back and forth through the door continuously for four minutes, but then is not present when the photo is taken, they will not be in the photo which represents that whole five minutes. If the camera takes a photo every five seconds then the person will have to move quite a lot faster to avoid being in a photo, and if the camera takes five photos a second then it will be almost impossible to get through the door without being photographed.
When sampling audio frequencies, it turns out that you need to take at least two measurements for each Hz: the equivalent of in and out of the door. So you always need to sample at twice the frequency of the highest frequency you want to record. For a 20kHz bandwidth this means sampling at more than 40kHz. This 'twice the highest frequency' sampling frequency is often called the Nyquist Rate, after the mathematician who worked out much of the theory behind sampling.
So what happens if you sample at less than the Nyquist rate? Just as the photos of the door every five minutes are not a good representation of who passes in and out, so the samples will be poor too. In audio terms this manifests itself as extra frequencies in the playback which were not there in the original. These usually have the characteristic feature that they 'mirror' the original frequency — if you increase the frequency of the sound being sampled, the extra frequencies which are added in by the aliasing will be lower in frequency.
Opinion by Martin Russ
Previous article in this issue:
mu:zines is the result of thousands of hours of effort, and will require many thousands more going forward to reach our goals of getting all this content online.
If you value this resource, you can support this project - it really helps!