Digital Signal Processing (Part 2)
PART TWO: Basic Effects
PART 2: Carrying on from last month's explanation of the basic principles of digital signal processing, Jim Grant explains how to put these into practice and create some useful and familiar effects.
Last month we explained the basic principle of digital signal processing: accruing audio samples, modifying them and outputting them within a single sample period. Now we will look at how to do something useful and interesting with the samples, and create some familiar effects, starting with ring modulation.
Ring Modulation is an effect that was present on many analogue synthesizers since, like filtering, it produced new (and very strident) timbres. Unfortunately, these on-board effects units were generally not very good, and always suffered from the problem of signal breakthrough. The basic principle of ring modulation is extremely simple - the output is generated by the multiplication of two input signals, and in digital signal processing terms this is achieved by taking a left and right input sample and outputting their product. The multiplication is performed consistent with the polarity of the two input signals, so that a combination of a negative and a positive input signal gives a negative result, and two negative inputs gives a positive output. Figure 1 shows a block diagram and input/output waveforms.
The interesting bits that highlight the multiplication are at points [A], [BJ and [C], You can see that the output signal is forced to change sign as the modulator changes sign.
Clearly, a ring modulator generates interesting output waveforms, but what does it sound like and is it any use? To answer these questions we must consider the output in terms of its frequency content. Strangely enough, the output spectrum does not contain either of the two input sinewaves that are used in our example. Instead, it contains the sum of the two sine frequencies and their difference. Figure 1 shows the spectral plot of the output generated by inputting sinewaves at 200Hz and 500Hz to a ring modulator. The input frequencies are shown as dotted lines for reference. So, here we have an effect that gives us two new frequencies of 300Hz and 700Hz which are likely to be non-concordant, since they are related by sum and difference to the input, instead of by ratios, on which conventional musical intervals are based. Feeding more complex waveforms into the ring modulator will create a new pair of frequencies for every harmonic contained in each of the input signals - the overall effect is a very metallic sound rather like that of a bell.
Performing this process with a DSP, as opposed to analogue circuitry, has the great advantage that the output will be exactly zero when either of the two inputs are numerically zero, which means that the effect should be noise-free. The output of an analogue ring modulator always seems to contain a faint drone due to tiny offset voltages that are prevalent in analogue electronics.
Moving on to a range of effects which are more common in DSP-based units, we arrive at Distortion. The action of a distortion unit can be illustrated with a type of graph called a 'transfer function'. This is a mapping of input to output signals via a curve that describes the action of the processing block. An ordinary amplifier will have a straight line 'curve' with the gain determined by the steepness of the line. Figure 2 shows the general idea, with an input and amplified output signal.
As we already know, amplification can be achieved by simply multiplying each audio sample by a fixed value (the gain). But what happens if we process the input samples according to the transfer function shown in Figure 3, which differs from the straight line function of an amplifier? The result is also shown in Figure 3, and the output waveform is a rather clipped looking sinewave.
Once again, the value of this process is best viewed in the frequency domain. A non-sinusoidal signal is equivalent to a combination of pure sinewaves - ie. a fundamental and any number of harmonics. The clipping of a sinewave is therefore equivalent to adding harmonics to the original waveform, and different transfer functions will generate different spectra. Harmonics are related to one another by whole number multiples of the fundamental frequency (the 2nd, 3rd, 4th... 11th, 12th etc), so the resultant sound is often rich in non-musical overtones. This is exactly the operation of a conventional guitar fuzz box which, as everybody knows, relies on the generous addition of screaming overtones to produce its effect.
Generally, an analogue fuzz box is a fairly simple affair, which limits the output signal once the input has exceeded a certain amplitude threshold. This can be done with a handful of components and certainly doesn't require an expensive DSP chip. However, a DSP system has enormous flexibility and can produce a great variety of transfer functions, giving a lot of control over the spectral changes at the output. The actual operations that take place at the sample level can be very simple. For example, the transfer function in Figure 3 could be implemented by a fragment of code like:
GET INPUT SAMPLE
IF SAMPLE GREATER THAN UPPER THRESHOLD THEN SAMPLE = UPPER THRESHOLD
IF SAMPLE LESS THAN LOWER THRESHOLD THEN SAMPLE = LOWER THRESHOLD
A transfer function which is made of straight lines can probably be generated with a few tests such as these to determine the range of the sample, but a more complicated curve will require an equation to describe it. Samples are processed by the equation, and then passed to the output DAC (digital-to-analogue convertor).
Figure 4, for example, shows a transfer function that maps the input signal to the square of itself - every sample is multiplied by itself before being outputted. The code is very simple:
SAMPLE = SAMPLE x SAMPLE
Here the equation can be expressed as OUTPUT = (INPUT)2. If we let the input be a simple sinewave - as in Figure 4 - the output will be a phase-shifted sinewave of twice the input frequency plus a constant amplitude offset. Processing more complicated waveform, such as speech and music, will result in a distorted waveform that will contain a very strong 2nd harmonic, ie. the octave. This is the basic principle of a guitar octave doubler, though it usually uses a slightly modified transfer function which, for analogue electronics, is easier to generate.
The real power of a DSP system is not only its ability to generate all the 'old favourite' effects, such as fuzz and octave doubling, but also to process the samples through new and interesting distortion curves. The transfer function curves can be made dynamic so that the output spectrum constantly changes with time, keeping the sound interesting to the ear. This dynamic distortion process can form the basis of a synthesis technique, where envelopes are used to modify the shape of the distortion curve while processing waveform samples generated by the DSP.
So far we have looked at a few basic effects that can be produced by the simple arithmetic operations of addition and multiplication. It can be quite surprising how a sampled signal, represented by a stream of numbers, can be processed by arithmetic to give drastic changes in the frequency domain. However, there are other simple operations that extend the effects possibilities even further, and the most important of these is delay. The delay can be very short - as short as a single sample — or as long as many seconds, and different delay times will produce quite different effects. For the sake of completeness and neatness, we will work our way through some common effects in decreasing delay time.
At long delay settings an echo effect is perceived. Echo is probably the simplest of all DSP effects to produce, since the delay operation itself is very easy to implement. Basically, all that is required is to read (input) samples from an ADC (analogue-to-digital convertor) and hold them for the required echo time before passing them to the DAC and thence to the audio outputs. Figure 5 shows a block diagram of this process, with the added feature of 'repeat'.
The delay line is implemented using a circular memory buffer (Figure 6) and a read/write pointer. The trick is to rotate the pointer around the memory by one location each sample instant, sending the contents of the current location to the DAC, and then writing the new sample from the ADC. A short program fragment that would do the job might be:
INCREMENT MEMORY POINTER
READ MEMORY CONTENTS AND SEND TO DAC
GET SAMPLE FROM ADC AND WRITE TO MEMORY
Applying this algorithm to Figure 6 we can see that the delay length will be 16 samples. The actual echo time will be the memory length divided by the sampling rate, so long delays require lots of memory if a high audio bandwidth is to be maintained. To keep the sound circulating around the delay line, each output sample is scaled and added to the input sample - this produces the classic repeating echo effect. However, it is important that the multiplication factor of the gain block is less than unity, otherwise feedback will occur.
A very interesting variation on the theme of echo is Pitch Shifting (sometimes called Harmonising). The basic principle is identical to echo - input samples are written to RAM and then read back out via a DAC - except that the read and write operations to sample memory are performed by two independent memory pointers. Pitch shifting is achieved by causing the pointers to rotate at different speeds, so that the input and output sample rates are different. Effectively, we have a 'real time' sampler whose replay pitch transposition is given by the ratio of the two memory pointer speeds. Figure 5 serves as a typical block diagram, but Figure 7 shows the new sample memory arrangement. The pointers rotate through memory as before, with one reading old samples and the other writing new ones. Often the delay time is chosen so that the memory can contain at least one whole cycle of the lowest input frequency.
If this seems as clear as mud, think about a few examples of the process in action. Firstly, if the two speeds are identical, we merely have a delay time equal to the number of memory locations between the two pointers divided by the sampling rate. Secondly, if the read pointer moves through memory at twice the speed of the write pointer, the DAC will output the sample memory twice before it is overwritten and the result will be a pitch shift upwards of one octave. Thirdly, if the write pointer moves at twice the speed of the read pointer, only half the sample memory will be sent to the DAC before it is overwritten, and so the pitch will shift down one octave.
The process appears simple enough, but it does not necessarily produce perfect results: the output can be full of glitches, due to the read pointer crossing over from a section of old (delayed) sample memory to a new section just behind the write pointer. Samplers overcome waveform splicing glitches by employing a technique called 'crossfade looping', which involves interpolation between waveform sections. A smooth crossfade requires accurate interpolation, which calls for high speed precision arithmetic - exactly what a DSP excels at. So, on one hand we can use an expensive DSP to produce a simple effect that can be achieved with a handful of logic chips, and on the other, we can really go to town on the quality of the harmonised waveform by exploiting the power of a DSP chip.
Reducing the delay time to a few tens of milliseconds produces the effect of automatic double tracking (ADT), which gives the listener the impression of hearing two simultaneous instruments from the input of only one. This was a common effect around the time that analogue delay lines first became popular, and was based on a circuit called a Charge Coupled Device (CCD) or, more colloquially, a bucket brigade delay line. When the delay time is reduced even further, the resulting effect is no longer perceived as a time effect, but rather as one in the frequency domain.
To understand what is going on, have a look at Figure 8. Here we see that the delayed wave is both added to and subtracted from the original input, to produce two different outputs. This type of arrangement is often called a 'comb filter', since there are notches (like those on a hair comb) in the output frequency spectrum where the interference between the delayed and the original signal produces points of cancellation. These occur when the delayed signals are exactly anti-phase with the input.
Figure 9 shows a typical spectrum, with the notches in the output forming the 'teeth' of the comb filter. For the summing output, the first notch will occur when the delay time is exactly half the period of the input waveform, whilst for the subtracting output, the first notch appears when the delay time is a whole period of the input. A 1 millisecond delay will generate, for example, notches at 1 kHz spacing - the spacing is the same for both the summing and subtracting output.
Modulating the delay time slowly will result in the notches being swept up and down the frequency spectrum, creating the familiar 'sound in a drainpipe' effect, more commonly called Flanging. Stronger colouration of the sound can be achieved by providing feedback, which causes peaks in the spectrum at the notch boundaries. This is equivalent to the resonance, or Q, control on a synthesizer's low-pass filter. It is important to keep the notches moving, since the ear responds to the parts of the spectrum that are present rather than those that are missing.
Chorus is another effect that is produced with very short delay times. The result we are looking for here is to create the illusion of multiple instruments playing in unison. Therefore a good chorus unit will use more than one delay line - perhaps two or three - with the delay time for each one being modulated independently. This highlights one advantage of a software-based system over dedicated hardware, in that the fundamental process of a time delay, long or short, can be shared even simultaneously by many effects.
Finally, with a shorter delay time still, we come to the very well known and gentle effect of Phasing. Here the time delay element is very short, and the original and delayed signals are added together to form the output. As with flanging, we get notches in the frequency spectrum, but now the spacing between them is quite large: remember that the spacing is determined by the reciprocal of the delay time, so a short delay of (say) 0.1 ms will give only two notches in the audio bandwidth.
Reviewing the possibilities of Digital Signal Processing as described so far, we can see that with only arithmetic operations and a delay element thrown in for good measure, a useful range of audio processes and effects can be created. The next step from here is to turn our attention from the obvious to the esoteric and look at digital oscillators, filters and frequency transformation devices, which we will do in the third part of this series.
Feature by Jim Grant
Previous article in this issue:
> Dr.T Xor
Next article in this issue: