The Secrets of Timbre
Don't touch those Level and Pan controls - you may be able to solve your mixing problems more easily than you think. Robert Rich explains how the timbre of a sound affects its performance in the mix.
After spending hours writing, arranging and recording a song, it sometimes seems impossible to get the mix to do it justice - the answers may lie in the secrets of timbre.
AT LAST YOU'VE found the perfect sound: the silky strings that seduced Kate Bush or the gated snare that ate Miami. Faster than you can say "fix it in the mix" the new sound has found its way onto your latest demo. But somehow, it just doesn't sound as good in the mix as it did on its own. Sound familiar? Sometimes it works the other way around too: patches that sound uninteresting on their own turn out to be perfect in the right context.
There are plenty of ways to make a mix sound good or bad, but if the instrumentation doesn't work to begin with, you're going to have a hard time straightening things out later. Good orchestration is an art in itself, but it becomes an especially big challenge when the sounds are unnatural. You can guess what a piano is going to sound like with a string patch on top of it, but what does a Prophet 5 going "glish" sound like with a DX7 going "fwoomp" on top? Are there any guidelines to help you slot synthetic timbres into a mix?
Thankfully there are - though there's no substitute for educating your own ears through experience - and I will try to present a few such guidelines in this article.
IN OUR HUNT for that elusive "hot mix", few things will help us more than an understanding of the nature of sound and human hearing.
Vibrations are the raw material of sound. Current music technology converts vibrating electrons into vibrating air molecules. These vibrating molecules tickle the hairs in your inner ear, causing nerves to "fire" in your brain. Alas, what we "hear" in our brains only indirectly relates to what is happening in the air. The ear has its own logic, its own prejudices and in essence, a good mix panders to the tastes of the human ear.
We can better understand the ear's logic by breaking down the spectrum of sound into its constituent frequency components. Anyone who has worked with additive synthesis or who has seen a frequency domain graph of a digital sample will be familiar with these ideas.
Any sound - including the sound of a complete mix - can be broken down into a set of sine waves. Each sine wave represents a discrete frequency in the audio spectrum. The amplitude of each of these sine waves represents the amount of that frequency found in the original signal. This is the essence of a Fourier transform. (For anyone who thinks that the Fourier transform is an abstraction, the ear uses this very technique to break down incoming sound.) Different nerves in the inner ear respond to different frequency bands, leaving it up to the brain to build a complete picture of the sound. A graphic equaliser also uses these principles, though with much lower resolution.
Let's begin by looking at some of the overall frequency characteristics of a good mix, and the qualities of various frequency bandwidths. With these characteristics in mind, we will look at the role of individual timbres within the mix. Remember though, that none of the recommendations here should be taken as gospel. These are rules-of-thumb which can help guide the direction a sound might take. In the end picture, nothing will help more than a good ear.
IN GENERAL, A satisfactory mix will appear to contain a relatively balanced amount of signal throughout the audible frequency spectrum. If we were to draw a curve showing frequency against amplitude, averaged across time, we should see no sharp peaks or dips, although this does not mean that the curve should look flat.
The ear responds far better to high-mid frequencies (about 1000-8000Hz) than to low (20-200Hz) or very high (10,000 plus) frequencies. The response will change with overall loudness as well, so it is a complex state of affairs. This mess is why we have so many ways of representing sound level. Decibels (dB) measure absolute sound pressure. Various standardised filtering (or "weighting") schemes attempt to match the dB curve to human hearing, the most common being A-weighting (dBA).
These technicalities bring us to a very important idea in mixing sounds: the loudness curve. To put things simply, increasing the extreme low and high frequencies in a mix will make the music sound louder, even when the absolute sound level (dBs) remains the same. This sense of loudness can also increase the perceived clarity of the sound. There is more to it than this, though; the ideal loudness curve will change depending on the listening level of the music, and upon the style of music.
For quiet musical passages, a lot of bass is needed for the low end to be audible. The upper few octaves will dominate the mix at low levels (1-8kHz) due to the sensitivity of the ear, so you should balance this range accordingly. Generally, the extreme high end (10-15kHz) will cut through fairly clearly, due more to the efficiency of most loudspeakers than to the sensitivity of the ear. While the ear does not tend to expect high frequencies during quiet passages, one good reason for including high-frequency material is to hide noise, an unfortunate reality in quiet music.
For loud music, extreme amounts of low or high frequencies can become annoying. At rock concert volume levels, the ear's response comes pretty close to flat, which explains why music usually sounds better when it's loud (a fact that's become a regular part of family arguments over the years). Due to the ear's improved response curve at high volumes, it's especially important to avoid resonant peaks in music that may be played loud. Not only can you hear these resonances more clearly, but they can be downright painful.
"Increasing the extreme low and high frequencies of a mix will make the music sound louder, even when the absolute sound level remains the same."
THE RELATIONSHIP BETWEEN harmonic content and perceived loudness plays an essential role in the placement of instruments in a mix. Consider the behaviour of nearly all acoustic instruments: the harder you blow, pluck, or hit them, the louder they sound. And as they get louder, they also get "brighter". In the natural world, loud sounds generally contain more overtones than quiet sounds. The ear expects this to be the case, so much so that we assume a sound is loud when it contains many overtones. If you compare a sine wave with a square wave of the same energy, the square wave will seem much louder.
Acoustic sounds naturally get brighter as they get louder, but in the world of electronic timbres, we have to plan for this to happen. Herein lies the cause of many a muddy mix. For example, the best way to add more "punch" to a kick drum is not to make it louder than everything else, but to boost its high-mid frequencies. The same applies to muddy basslines: try mixing in some velocity-sensitive "pluck noise" overtones. The slightest bit of high-end can work wonders in clarifying a bass sound. This is exactly what makes psychoacoustic enhancers so popular. But if your sounds are well structured to begin with, you should never have to rescue sounds with lots of outboard gear.
Of course overtones affect perceptions other than just loudness. An awareness of the effects of harmonic content on imaging can help clean up a mix. One of the most abused imaging characteristics is that af distance, or depth. You don't need a dozen different reverbs to create subtle imaging and layering in your music, just be aware of the fact that sounds with fewer overtones appear farther away than sounds with many overtones. The reason for this lies once again in our expectations of sound based on sounds in nature. High frequencies are absorbed more easily by the atmosphere, while low frequencies propagate over longer distances. (Whales can communicate over hundreds of miles using low-frequency thumps.)
Now that digital reverbs are finding their way into more home studios, people are getting into the habit of giving everything a wash of synthetic space, with little thought for the actual perceived placement of sound. If you want a sound to appear far in the distance, don't just drown it in reverb soup, first make it sound like it's far away by rolling off the high-end a bit - then drown it in soup (well... you know what I mean).
On the other hand, sounds that you want to stand out clearly at the front of the mix need not be louder than the rest of the music, they need only contain a wider harmonic spectrum. Notice how clearly most DX7 voices stand out - FM synthesis excels at generating lots of overtones. When the DX7 first appeared on the scene it was always responsible for the sound that sat in your face while the rest of the music played in the background. This characteristic can help your music or hurt it, depending on the context.
Another important frequency-related characteristic of imaging involves left-to-right discrimination. The ear is far more sensitive to the stereo placement of high frequencies than to the placement of low frequencies. In controlled environments, people have a hard time discerning the location of tones below 200Hz. Only with tones above 1-2kHz can we accurately determine location. So, if you want a sound to have a clear stereo image, give it plenty of overtones.
Panning the bass generally confuses the imaging by altering the mix depending on where a person stands relative to the speakers. In other words, the bass might sound louder in one speaker than the other, but that won't necessarily help the stereo image.
If you want stereo imaging on a bass track, try splitting the high-frequency components from the low-frequency ones, then process and pan only the highs. With acoustic instruments this splitting requires drastic use of EQ. The trick works well in theory, but in reality it's not easy to keep an acoustic timbre sounding good after such drastic equalisation. With a couple of synthesisers and MIDI, though, you can create your own acoustic reality, and the stereo image can become your playground. Split a sound across two synths, with one covering the low-frequency components of the sound, centrally panned. The other synth, producing the upper harmonics, helps provide the imaging. With careful programming, this setup not only tricks the ear into fusing the two sounds, but allows a huge amount of control over the stereo image without muddying the low frequencies.
THE EAR IS unbelievably sensitive to the timbre of an instrument. For example, if two violins play the same melody at once, we can usually track the two instruments with little difficulty. Even the most advanced computer systems have yet to come close to our abilities in timbre discrimination. As a result, we rarely give much thought to the overlapping qualities of different instruments in a piece of music. But even a passing understanding of these qualities can really help when orchestrating electronic timbres.
Have you ever wondered why most lead lines occur in the upper register? Try playing a fast arpeggio with a smooth sinusoidal timbre, first at the high end of a keyboard, then at the low end. The bass arpeggio is very hard to discern. This has to do with many factors, primarily the fact that the ear has very poor pitch resolution at low frequencies. If a low sound is going to move quickly, it needs a lot of overtones. Better yet, leave the busy stuff for the upper voices.
When music has a lot of activity, and you want each part to be audible, the timbres of each instrument should be fairly distinct from each other. When multiple instruments play the same note, the ear uses two major cues to distinguish them: vibrato and overtones (especially transient overtones). If these combined sounds have no vibrato, then the ear must remember the harmonic spectrum of each sound (the timbre). These spectra are not static, but change with the envelope characteristics of the instruments. If the combined sounds have no transients as well as no vibrato, they will sound like one instrument. Herein lie some of the keys to interesting timbral balance.
Personally, I don't believe there are any rules for ideal instrumentation or orchestration - except one: keep it interesting. Because instrumentation involves mixing together different timbres, interesting orchestration should introduce changes in the interplay of these timbres. For example, you can make two instruments fuse together, separate, then fuse together again. Two very similar timbres will take on independent identities if their harmonic transients (envelopes) differ even slightly, yet when played together their similarity can contribute a feeling of richness. In general, you get a "big" sound by fusing together the timbres of many similar-sounding instruments. At the other extreme, two dissimilar timbres may lend clarity to melodies or harmonies, yet their combination may not make the music sound any bigger. Control of these characteristics can bring music alive.
But remember, nothing will help your music sound better than listening and learning, and that requires patience and a good ear. This article can't teach these skills, but knowing why things sound the way they do can help you understand what you're hearing.
Feature by Robert Rich
Previous article in this issue:
Next article in this issue:
mu:zines is the result of thousands of hours of effort, and will require many thousands more going forward to reach our goals of getting all this content online.
If you value this resource, you can support this project - it really helps!