A Deeper Wave
Many of today's popular digital synthesisers use variations on "wavetable synthesis". Chris Meyer explains what this mystical but powerful method is all about.
Despite their various titles, many of today's synths are based on a commonly misunderstood sound-generation method called Wavetable Synthesis. We explain what the system is, how it works, and how it's implemented on different instruments.
OVER THE LAST couple of years, we have seen a rash of "new synthesis techniques" pop up on various instruments. Some - such as FM, phase distortion, and sampling - are indeed very different from the old subtractive analogue beasts we know and love. Virtually all others, however, are variations on what are known as Wavetable Synthesisers.
Wavetable synths are, in fact, very similar to the aforementioned analogue machines. They have essentially the same "patch" or voice structure as a Minimoog, and differ merely in the fact that they have different kinds of oscillators, and some additional tricks thrown in between the oscillators and the filter. True, to say "merely" is understating things a bit - sonically, a lot of new ground has been opened up by this new breed. But it's surprising how conventional some of the new "vector", "crosstable", "linear arithmetic", "structured adaptive", and "additive" voices really are at the level of pure voice structure. And on the other side of the coin, many of these new synths, in all their digital glory, don't possess some of the capabilities of older analogue instruments.
The goal of this article is to explain what really goes on inside a wavetable synth, particularly with regard to why they sound different, and why they have some strange distortions that we're not used to hearing from their analogue predecessors. I'll also attempt to demystify some of the Technogibberish that has been used to describe these instruments lately.
With luck, conquering these problems will also help conquer some of our fears of whether or not we could ever learn to program them let alone get a good patch out of them.
ANALOGUE OSCILLATORS ARE strange, primitive beasts. An infinitely varying voltage, as opposed to the safe, step-by-step world of digital 1s and 0s, tells them what pitch to play. This "pitch" voltage, in turn, controls how long it takes a separate internal voltage to build up (or drain down) to a certain threshold. When the voltage hits this end limit, it starts the process over again.
The result is a raw sawtooth wave that has to be processed by yet more analogue electronics to come up with our typical (in increasing order of difficulty to generate) square, modulated pulse, triangle, and sine waves.
How does a wavetable oscillator mimic the same function? Well, the varying voltage of the analogue version is replaced with a table of values. The values in this table consist of the individual words of a sample of that wave - in other words, the values of these words represent the level of the wave to be output at various points in time.
A wavetable oscillator produces a wave by reading the values (words) out of this table in order and passing the result on to a DAC (digital/analogue converter) to create an audio signal. In our sawtooth example, the table would start with very high values, which would go down as the table progresses. Other tables with more complex waves are no harder to play than the sawtooth, making a variety of waveforms easily available - including ones with more interesting spectra than sawtooths and square waves.
The harmonic spectra (read: "timbre" or "tone") of analogue waves tend to be rather limited, and follow a predictable pattern of having a strong fundamental ("root" or "base" pitch) followed by successively weaker overtones, or harmonics. Square waves - the "buzziest" — have harmonic amplitudes that die away at the pace of one over the harmonic number; in other words, the second harmonic - the octave - is half as strong as the fundamental, the third is one third as strong, and so on. Pulse waves have "brighter" harmonics, but at the cost of having a weakened fundamental. All the others have even less energetic spectra - that is, fewer overtones - than these two.
Waves developed for wavetable synths, on the other hand, tend to have a bias towards the high end and more upper harmonics - they don't have to follow "natural" progressions. For example, one set of factory waves I developed for a certain wavetable synth were started by taking the 12th harmonic, ramming the level all the way up, and then building the sound from there. Only later did I bother adding some low end. The result was a very bright set of waves with an unusual harmonic series.
Apply this principle many times over, and you can see why wavetable synths sound so bright and seem to have so much more bandwidth than typical analogue synths. I remember comparing a Prophet 10 and a Prophet VS - both synths I love very much - side by side, and wondering why the tweeters seemed to disappear only when I played the older 10.
There seem to be two different ways of creating wavetable oscillators in hardware. The first involves taking a single cycle of a wave, and transposing it all over the place - I call this the single cycle method. Changes in pitch are achieved with this method by reading out the data at faster and slower rates. The waves in question here tend to be between 64 and 256 sample words long. In theory, the highest harmonic contained in one of these waves is half the number of sample words, and its frequency is the harmonic number times the fundamental frequency. For example, if a 64-word wave is played at a pitch of A440, the highest harmonic included in the wave would be the 32nd, and its frequency would be slightly over 14kHz (440Hz x 32). Kawai, PPG, and Sequential, among others, use this method, which is exactly like playing a one-cycle loop on the end of a sample.
The second way - which I call the long table method - requires the equivalent of taking a very high-resolution sample of a single wave - say, several thousand sample words per cycle. Unlike the previous method, the wave data is always played back at a constant rate. The pitch is determined in this case by deciding how many sample words to skip every time you play a new note. If you want a very low pitch, the oscillator will skip very few or no sample words; for a high pitch, it will skip a lot of sample words. This method gets the oscillator through the table of the wave cycle faster, which restarts the process sooner, and results in a higher pitch. Korg's DW series instruments take this approach.
"The tables for more complex waves are no harder to play than a sawtooth, so a variety of waveforms are easily available on most wavetable synths."
NOW ITS TIME to weigh up the relative advantages and disadvantages of each method - which, I freely admit, are in many cases subjective preferences. The biggest factual difference between the two is the amount of memory each takes up - the long table method eats up much more memory. Synths employing the single cycle method, therefore, can have more waves for the same cost (memory being directly related to cost, as it is). And, because of the size required for each wave, the first method lends itself to allowing samples to replace the waves (if, like the VS, the instrument can receive sample dumps via MIDI). This is not possible with long table instruments because their "sampling rate" is so high that the waves must be generated by a computer. The smaller size of the waves in single cycle instruments also makes it easier to store them in RAM, edit them, and so forth.
Other differences have to do with the artifacts each process creates. The nature of the single cycle method is such that the "sample" rate is an exact multiple of the playback rate - in other words, there is an integer number of sample words for each wave. This means that the clock noise (an unavoidable aspect of the playback process for wavetable synths as well as samplers) of a wave played low is in tune with the note itself, which is more musically pleasing.
Audibly, clock noise is sort of (but not quite) like a square wave with a frequency of half the playback sample rate. So if you take a sample at 30kHz and transpose it down two octaves, you'll hear a buzz with a 7.5kHz fundamental. Now, let's say the sound you sampled had a fundamental frequency of 880Hz - an octave above tuning A. In the above example, it has been transposed down to 220Hz, but 7.5kHz is not an integer harmonic of 220Hz, so the result would be an inharmonic frequency component which is rather nasty.
Now let's take a look at the single cycle wavetable synth. The "sample rate", such as it is, is a precise integer multiple of the wave's fundamental. For example, on instruments which have 128 sample words per wave, like the VS, the output of the wave is just like a continuous sound with a fundamental at 234.375Hz sampled at 30kHz, so the result is clock noise in tune with the note - a bit like a square wave as a really high, in-tune harmonic. I personally find this effect pleasing (or at least interesting) on many sounds, because it gives a hollow, digital sound, like a "PPG-ish" bass.
Unfortunately, there is a negative side to this method. If you take a 128-word wave and transpose its fundamental down to 40Hz, its highest "legal" - not counting the clock noise - harmonic will have a frequency of 2560Hz (64 X 40Hz), which is not a terribly high bandwidth for those low notes. With a fundamental of 300Hz, (very roughly middle C), however, the highest harmonic is above audibility. So the single cycle method only has a bandwidth problem in the low range.
The long table method presents the opposite set of characteristics. The "sample rate" of the stored waves is so high that you tend not to hear the clock noise. I personally think this robs the Korg DWs from having the same "ballsy" sound in the bass region as the others. I admit, however, to preferring grungier things in life than many other computer-types - for you cleaner individuals, that annoying clock noise isn't there if you don't want it. And the bandwidth on the bass notes is much higher.
There are disadvantages to this method as well, though. For while the "sample rate" is so high that we don't have our clock noise problems, the fact that the sample rate relates unevenly to the playback frequency introduces other distortions. For example, let's say our wave is 2048 sample words long, and our playback pitch is such that we have to skip 500 samples every time we want to play one. Starting with "1", we'll use samples 1, 501, 1001, 1501, 2001. Then, since there are only 2048 sample words, the wave consists of 453 (which is 2001 + 500 - 2048), 953, 1453, 1953, 405, and so on - in other words, the wave we're playing back is changing from wave to wave, until our counting series repeats. One might be tempted to say, "Ah. since the wave does not repeat as often, those changes must sound more acoustic." But in reality, they're distortions which show up as subharmonics (not related to the playback pitch, unless you're very lucky).
If this is still hard to picture, imagine our "wave" is a sawtooth, and the sample number also happens to be the amplitude of that sample. When you draw the series of numbers out on graph paper you'll end up with a pretty weird-looking sawtooth wave (see Figure 1). There's a low-frequency modulation that appears as a pattern of fluctuations over many waves, almost as if an LFO had been applied. It's even worse if the wave has "squiggles" over its period, unlike the perfectly straight sawtooth.
And by the way, the same things show up on samplers where the sample rate is not neatly related to the fundamental pitch, though it tends to be masked by the fact that the sound itself is busy changing.
EVEN MORE GREMLINS and realities threaten to wreak minor havoc with our wavetable oscillators. For example, these oscillators have to run at a frequency much higher than the note we're playing (the number of samples per wave times the fundamental), which is difficult to maintain in hardware. And since the oscillators are digital, they don't take well to being modulated - frequency modulation, for example, is a royal pain to calculate for these oscillators (although Yamaha are breaking some ground with their TX81Z here). As a result, you don't see as many tricks - FM, sync, and the like - as often as you did on the old analogue synths.
The "gremlin" fallout of this is that these frequencies get too high to work with in realistically priced hardware - so the hardware has to cheat. For higher pitches, some (such as Korg and Sequential) switch to using a smaller wavetable (ie fewer sample words per cycle).
"PPG 'wavetable' actually consists of a series of single-cycle waves, the patch decides which of these single waves plays at any given time."
To hear the effects of this, take a wavetable synth and do the following: set up a patch that uses just one oscillator, with filters and so on wide open. Play it at the lowest note, and then in monophonic mode play the highest note with glide at its slowest. You can hear the sound change as the synth switches wavetables - you can also see it quite well if you have access to an oscilloscope.
Some instruments can be quite offensive in this respect. Others try to perform this switch only when the highest harmonic is beyond audibility, but some people can still hear it. The effect is not unlike the "seams" between multisamples or a sampler. Oddly (but thankfully) enough, this is rarely audible in the context of playing chords on a "real" patch with detuning, chorusing, and all.
On now to another problem related to the digital technology employed in wavetable synths. With analogue oscillators, we had to put up with tuning drift, and pitches that wouldn't track each other over the entire length of the keyboard. Digital oscillators have a different set of problems in trying to play the correct pitch. Building oscillators that can play back at a very high frequency and still hit all of the equally tempered notes properly is very difficult. A "master clock" of some sort has to be divided down by an integer to try to hit the pitches of the equally tempered scale - which, unfortunately, are not nicely spaced.
As a result, some compromises or decisions have to be made. One involves spending the money necessary for very high-speed clocks and custom chips. Another requires accepting the difficulty and not hitting the notes exactly in tune - this method ends up with a rock-solid pitch, but poor intonation (the intervals can end up more out of tune than even normal equal temperament saddles us with). A third is to "jitter" about the correct pitch - this provides good intonation, but creates all sorts of weird sidebands and warbles on the higher notes of some wavetable synths, because the waves end up alternating pitches to create an average for the correct one.
As I hinted above, these "problems" certainly do not render wavetable synths useless, they just help to explain the occasional head-scratcher of "Why is the bloody thing doing this?" These answers should help you find solutions to any difficulties you may have been having with them (such as using bright waves for higher notes instead of ones that are transposed way up).
But enough explaining the mysteries of engineering - now on to explaining the terms of marketing and what each wavetable-based synthesis algorithm is really about.
"Normal" Wavetable Synthesis
Instruments in this category include the Kawai K3, Ensoniq ESQ1, and samplers that have a wavetable synth mode (Korg DSS1, Casio FZ1 and, to a lesser degree, the Sequential Prophet 2000/2002). They all feature the normal voice structure of single-wave oscillators (from one to three of them) routed to a typical VCF and VCA. The ESQ1 throws in the additional trick of placing VCAs between the oscillators and the filter in the manner of the Roland JX8P, to allow fading in and out of different timbres.
In essence, you can view all of the above as very similar to analogue synths, but with a fantastic VCO section.
This method brings us to the original commercial wavetable synth - the PPG Wave. Here, a "wavetable" actually consists of a series of single-cycle waves. The patch decides which of these single waves plays at any given time.
The PPG Waveterm allows you to build these wavetables by selecting certain waves to be at certain points in the table - a sine at point 1, a square at 12, a vocal at 13, something bizarre at 32, and so on. It then calculates the intermediate waves, and fills out the holes in the table with them.
The PPG gets its timbral variety by switching dynamically between the waves it's playing at any given moment in time. An envelope, the LFO, pressure, velocity, and so on can be routed to this parameter, so that for example, soft hits may play our sine wave of above, while digging into the pressure eventually moves us up to our square wave.
The catch in all of this is that the switch is a hard jump to a different table. So what happens if the envelope decides to switch waves in the middle of a cycle, and the amplitudes of the two waves at that point do not match up? You get a click - a bit like a bad loop - which gives the PPG its characteristic tumbling/clicking sound. Incidentally, a friend of mine was able to simulate this sound with a sampler by taking several samples of a wave and splicing them together without concern for zero crossings. I don't know about you, but I find it rather interesting that we can learn to make strengths out of weaknesses, and imitate them in instruments that don't have those weaknesses.
"Demystifying the fancy terms used to describe these instruments doesn't detract from the sound of any of them -it just shows how similar they are.
Appearing at February's Frankfurt show was a new machine - the Keytek CTS2000 - that takes a smoother approach to the above. It's like a PPG with three waves defining the points in the wavetable, but instead of calculating the individual waves in between the points, it fades from one to another. The company calls this process "crossfading". Referring to our above example, the Keytek would perform a crossfade from one wave (say, our sine) to a second (say, our square), and then from that second to a third (such as our bizarre wave). This is a one dimensional progression, like the PPG's, except the Keytek fades the samples, as opposed to hard-switching them, so there is no clicking in the final sound. This effect can be simulated with the amplitude envelopes on the ESQ1.
The one area where this method is lacking (compared to that of a PPG) is that it is not possible, to the best of my knowledge, to go back to the earlier waves after a note has been triggered, in case you might want to. Still, it provides for and is capable of producing many excellent sounds.
Roland use this phrase to describe the sound-generation method used in their RD line of digital pianos, such as the RD1000. It is very similar to cross table sampling, with two expansions and a twist. The twist is that instead of a normal crossfade from one wave to another, there is some mathematical equation describing how to proceed from one wave to another (such as natural exponential decay, as opposed to a linear fade).
One of the expansions is that SAS uses a few more waves as points to describe a sound. For example, a couple may be used to describe the hammer tone and attack of a piano, and several more to chart its decay.
The other expansion is that different waves (and therefore paths) exist to describe different velocity levels - hitting an acoustic or electric piano key harder results in quite a different sound from hitting it softly. Roland have used this to great effect in reproducing natural percussive instruments, even though the SAS technique itself more closely resembles a wavetable synthesiser than a sampler.
Stretching the crosstable method into two dimensions a bit differently from SAS is the Sequential Prophet VS. Here, four waves define the four corners of a square (or diamond, if you prefer). Any point in the square is a mix of the four waves, the various levels being proportionate to how close you are to any one corner. As opposed to just proceeding from one wave to the next (as you do with all of the above methods), you may proceed to any other point in the square under control of an envelope, the LFO, velocity, or whatever. A good analogy is somebody standing in a room with a speaker in each corner and wandering around it to hear the different mix.
In reality, the hardware is essentially four single-wave oscillators fed into a quad panner (remember quadraphonic sound?), followed by a typical VCF/VCA/panning stage. Four waves tend not to be enough to describe instruments such as a piano, but they do lend themselves towards radically shifting and changing sounds. And like the PPG, the VS also allows you to retrace your steps through the timbral-change path.
One new machine that touts additive synthesis - the Kawai K5 reviewed elsewhere this issue - is actually just a variation on a crosstable synthesiser. On the K5, the user gets to choose the harmonic spectra of two oscillator "groups", and then arrange the pitch and loudness envelopes of these groups. In reality, these harmonic spectra get converted into a couple of waves played back by our typical wavetable oscillators, and then fed into a voice structure very similar to that of an ESQ1 or CTS2000. While not being a true additive synthesis machine (a "true" one, in my book, allows definition of the pitch and amplitude envelopes of each harmonic, which big American machines like the Synergy and MuLogix Slave 32 allowed), it does present additive synthesis in a more accessible way - because defining all of those envelopes on a real additive machine gets to be a real pain, real fast.
Roland's newest synthesis method, employed on their already popular D50, most closely resembles sampling compared to all of the above techniques. There are two sets of oscillators - a set of digital oscillators that play back the waves we're most used to seeing on our analogue friends (namely, sawtooth and square), and a set that play back the sampled attack of a sound (going quite often into a looped portion of that sound). This latter method was actually employed on the PPG 2.3 factory wavetables, where two of them were the beginnings of a sax and a piano, leading into a one-cycle loop of each sound.
The LA approach represents an effective new method because the attack of a natural sound tends to be its most complex moment, and a sample (as opposed to several waves and a bit of envelope trickery) is the best way to recreate it. Completing the sounds with standard waves means that normal synthesis techniques may be employed to fill out the rest of the sound without chewing up the memory that a normal full-length sample would require. The D50 includes the extra bonus of permitting one of these waves to ring-modulate the other (ring modulation, in hardware, essentially allows one wave to vary the amplitude of another at audible frequencies - as opposed to FM, which varies the pitch). Again, the ESQ1 also allows some of this cross-modulation.
DEMYSTIFYING THE FANCY terms used to describe these instruments doesn't detract from the sound of any of them - it just shows how similar many of them are. I believe we can only expect that future variations on the theme will include more of the features that some of these instruments have - more analogue-style modulation, more waves to fade between, sampled attack transients, and perhaps even longer-than-one-cycle waves to give more real-life variation and motion to the raw material.
From there, the machines will probably continue to employ normal "analogue" processing that many of us are familiar with to add any other timbral, spatial, and amplitude variations.
Even though it doesn't look like it - and the methods may have different names - it's very much the same game. There's just a variety of available equipment and packaging - and we're certainly no worse off for having that variety.