A Vocal Chord (Part 2)
In the second part of his piece on sampling the human voice, Tom McLaughlin discusses making up vocal multisamples and how to solve the problems you can expect to encounter.
With the raw vocal material safely on tape, a stunning set of vocal samples is just one operation away: transferring the voice to the sampler.
LAST MONTH I recommended recording every semitone within your vocalist's range while singing both ascending and descending semitone scales. With good reason too. After giving your recordings a rest and listening to them with a fresh set of ears you'll find that some notes sound infinitely better than others. Maybe some are more in tune or have a more consistent coloration. Maybe the higher register of the descending scale sounds less strained than the ascending scale. Whatever the reason, some notes will shine brighter than the rest.
List the pitches you recorded and, while listening to a rough mix of your recorded voices, note the takes that sound best to your ears. These are the ones you'll want to consider for use in your multisample.
To avoid sampling every one of these "superior" takes (even samplers at the upper end of the market have a limited amount of memory space), you'll need to whittle the intervals between the takes you're going to use down to maybe every second, major/minor third or perfect fourth. Keep an ear out for consistency in tone colour from one selected take to another, paying special attention to the transition area between your vocalist's chest and falsetto registers.
THIS IS VERY similar to audio mixing. If you're going to be sampling in mono, and I'll assume that you are, be sure to monitor your recordings in mono, so you'll get a good idea of what's going to end up in your sampler. If your sampler has a monitor output, by all means monitor your recordings using that.
Route your multitrack tape recorder's outputs through individual channels of a mixer, equalisation set "flat" with no effects, and start playing your vocal recordings from the beginning of the tape. Listen to the first vocal track and while the tape is running, bring up the level of the second track until a good balance between the two is obtained. For an homogenous ensemble mix, repeat this with the other tracks so that no one track predominates.
QUITE A LOT can be done with equalisation. It all comes down to personal taste, but with EQ you can manipulate vocals to make them sound close in or far away, full or airy, subtle and sexy or tinny and thin. It's conceivable that several sets of vocal samples can be obtained from one set of recordings, just by using different EQ settings.
When pushing the top end of your recordings with EQ, or cranking up the "drive" on your psycho-acoustic enhancer (aural exciter) to bring out clarity or breathiness, it's a good idea to put a low-pass filter between the mixer and sampler set to let frequencies through just below half your sampling rate. This way you can do whatever you want to the top end, while preventing any possibility of aliasing.
If you hadn't used any reverb while recording your vocals and feel the need for some "space" around them, now's the time to add it. A good vocal plate or hall setting, with a little top and bottom end rolled off, will make your samples sound more lush and professional. If you put a different room around each vocal pass while recording to add an individual character to each track, routing them all through the same room now will help to homogenise the lot.
NOW TO THE serious sampling stuff. Make sure you have enough formatted disks to hand. You'll need at least three for the work ahead of you; one for storing your unedited source samples, a working disk, and one to save your edited and manipulated samples to. Once you've arrived at the final versions of your samples, the working disks can be erased and used for another project. Make sure to name each of your samples according to pitch, octave and whether it's from the ascending or descending scale. This is not only for keeping track of them while editing and looping them but will make life a lot easier when mapping time comes.
Your job of sample editing, looping and manipulation will be made infinitely easier and less time-consuming with some sort of visual editing. Forgetting all the fancy facilities, just being able to see the waveform you're working on will cut finding acceptable loop beginning and end points down from possible hours to probable minutes.
Be careful with input level when sampling vocal ensembles. Monitoring peaks on tape recorder VU meters and sampler input meters can be rather deceptive. As with other ensembles, there will be wide variations in level along the length of a note due to the different vocalists hovering around the central pitch (visually displayed, vocal ensembles often look like some sort of roller-coaster ride), so it's easy to run into digital clipping on the peaks. Unlike many percussive sounds, where a slight amount can be used to your advantage, digital clipping sounds really ugly on sustained samples, so err on the side of too little input level to be safe.
THE SAMPLING RATE(S) will be governed by the available rates on your sampling unit, the amount of memory space available on your sampler, how many edited samples you expect to fit into that amount of memory space and the amount of top end and fidelity you require.
It's widely accepted that a playback frequency response of 3-4kHz is the bare minimum needed to sample intelligible speech. Since you're working on samples that will hopefully be more musical than I Speak Your Weight scales, you'll want to work with sample rates considerably higher than this.
From my experience, you'll need a minimum playback bandwidth of 8-10kHz. If you can afford the memory space, by all means sample at full bandwidth, but for the sake of economy, a good compromise seems to be in the 12-14kHz range or 15-16kHz if you're sampling exceedingly breathy vocal samples.
Take time to experiment with a few different sampling rates to see what you can get away with, and let your ears be the judge. Low-pitched samples can often be sampled at lower rates with little or no apparent loss in fidelity. To calculate a sampling rate, take your desired playback frequency response, multiply it by two and add roughly 10%.
LETS ASSUME YOU'RE working on a set of oohs, aahs, humming and so on, so you'll want to sample as long a piece of vocal material as is practical. It's not uncommon when I'm working on a set of vocal or string samples, to fill up three or four entire floppy disks with raw material. Once fully edited, these samples might take up anywhere from 75% down to as little as 15% of the original memory space.
BEFORE YOU START looping you'll probably want to trim some of the beginning off of your oohs or aahs. Unless you like your samples "au naturel", you can make your vocalist(s) sound a lot more polished if you fade in after any harsh or "out-of-tune" portions have passed.
Aahs are notorious for having rather rough attacks and virtually all of your takes will start out flat or sharp of the desired pitch. Not having the precision of a key or fret to rely on, we approximate the pitch we're aiming at with our vocal tract before we even open our mouths, and there's always a few milliseconds of fine-tuning that goes on with both pitch and tone colour once we actually start singing.
Although you can use a VCA to fade in harsh attacks, if you play a sample back at progressively lower pitches, the VCA will allow progressively more of the rough portion to pass. With the right software you can digitally fade the sample in and ensure that it has a smooth attack. If you don't have "fade-in" software, but do have software for "fading out", simply reverse your sample, fade out, then reverse again to hear the result.
There are several methods of editing and looping vocals. The method you use will depend upon how much time you want to spend, the editing and manipulation facilities available to you and the amount of memory in your sampler.
To minimise confusion, adopt some sort of numbering/filing system for your "samples in works"... maybe C1a1 = C1 ascending, 1st version.
METHOD A is the most straightforward, being not a lot more than sampling common sense, and requires the least amount of time and editing software. Unfortunately it eats up memory space like there's no tomorrow, relying heavily on long loop lengths for its success.
1. Make sure original sample is saved as a backup.
2. Find suitable sample start point.
3. Locate and set passable loop start and end points aiming for as long a loop as possible.
4. If all is well, discard unused sample material and save it. If not, recall backup and repeat 2 and 3.
METHOD B is pretty much the same as Method A with the addition of loop crossfading, merging, blending or whatever your sampler's software calls it.
This is what I fall back on if I'm in the middle of a session and a client wants "instant" results. After 3 and before 4 in method A, making sure I have enough sample material after and/or before the loop to meet the requirements of the software I'm working with. I'll calculate and carry out a "loop crossfade" (while keeping my fingers crossed).
METHOD C uses only the choicest portion of your vocal sample, is an excellent way to cut down on valuable memory space and, depending on the amount of pitch variation along the length of your sample, can also make your ensemble sound twice as large as the original sample. Ideal for sustained vowel sounds, it ignores the attack portion of a vocal sample and relies on VCA manipulation to fade it in. For added realism, automatic pitch-bend (or "warp" as Akai call it) on the way into a note can be employed.
1. Make sure the original sample is saved as a backup.
2. Find portion of sample with the most consistent pitch and coloration.
3. Discard sample material before and after this portion.
4. Save this new material.
5. Reverse this new material and save it.
6. Combine the forwards and backwards versions of the same sample at a 50/50 mix via software, then save.
7. Locate loop start and end points equidistant from the centre of this mixed sample. With sufficient loop-point hunting, you shouldn't need to employ any form of loop crossfading although you may find that alternating loops work more successfully than forward loops with this technique.
8. Discard sample material before and after loop, then save.
METHOD D is the same as above, except that you leave enough un-looped sample material at the beginning to crossfade the attack portion of the original sample through. The advantage of this is that it gives you a more natural entry to your manipulated sample. You must make sure, however, to fade out the attack portion of the original sample before the loop of the looped sample begins.
"With EQ you can manipulate vocals to make them sound close in or far away, full or airy, subtle and sexy or tinny and thin."
METHOD E. If I have the time and enthusiasm this is my favourite method of ensemble sample looping and manipulation. It uses up the least amount of memory space, gives unrivalled loop results and gives your samples a silky smoothness. (This technique alone is worth five times the cover price of the magazine so you'd better appreciate it, you pack of snivelling sample snipers!) The one snag is that it requires a loop crossfade that takes its crossfade material only from before the loop. If you're clever enough, you could probably work it out for other forms of loop "blending" by keeping judicious track of loop and sample positions.
1. Make sure original sample is saved as a backup.
2. Locate smoothest portion of sample with the most consistent pitch and coloration.
3. Discard sample material before and after this portion.
4. Save this new material for safety.
5. Loop up to 50% of the later part of this material.
6. Discard all material after loop.
7. You want to be left with exactly twice the amount of material in your loop so discard any material in the front portion of your sample that adds up to more than two times loop length.
8. Calculate and execute loop crossfade, save.
9. Reverse, save and combine with above material.
10. Calculate and execute loop crossfade.
11. Pass GO, collect £200.
As in D, you can crossfade the attack portion of the sample through the first half of this material for a more natural entry or discard the first half, leaving only the loop, to save even more storage space. This process, theoretically, can be repeated on the remaining material, discarding the first half each successive time, until you are left with a loop only a few wave cycles long epitomising your vowel sound. I've repeated this three or four times on samples with varying degrees of success.
ONCE YOU'VE EDITED and looped your samples, the time has come to map them out on your keyboard as a multisample. If you haven't already, assemble all your final versions on the same floppy disk, preferably in order from low to high.
The best place to start mapping is to assign samples to pitches on the keyboard relating to the original sampled pitch. From there it is merely a matter of experimentation to see how far from the original pitch a sample can be taken to meet its adjoining sample.
Samples generally travel downwards better than upwards. If I've sampled every minor third I'll take a sample down two semitones and up one semitone from its original pitch as a starting point.
As mentioned last month, this is where having ascending and descending semitone scales really comes into play. To arrive at the smoothest transition between vocal registers in your multisample, you'll probably have to do a fair bit of "mixing and matching" between your ascending and descending scales, maybe even a bit of back-tracking to your source recordings to fill up any holes in your multisample. (It's a good idea to jot down or make a mental note of your mixing levels, EQ and effects settings for this reason.)
Positional crossfading is a life-saver with awkward transitions between different samples, but really should be used as a last resort. If a sample sticks out of your multisample like a sore thumb, after a positional crossfade to the samples on either side of it, the sample will stick out like a sore thumb, but with a smoother transition between adjacent samples. It would be wise to sit on that sample for a while and see if you can find a substitute from your source recordings or do without it altogether, extending the adjacent samples further up and down to fill the gap and use a crossfade between them if necessary.
Remember that each positional crossfade uses up two "voices" on your sampler to do its business. Employing extensive positional crossfading to make your keyboard map will smooth things out, but also cuts your keyboard polyphony in half - eight-note polyphony instantly becomes four. Keyboard maps take up relatively little memory space... experiment to your ears content.
VCAS ARE YOUR samples' window to the world. You don't hear nothin' if these aren't turned on. Unless you want a percussive entry or have your vocals fade away while a key is held down, we really don't need to worry about the decay control of a standard ADSR envelope (attack, delay, sustain, release). A good starting point for oohs and aahs is to set the attack to about a half-second, sustain up full and the release to suit your taste. The VCA release will be very much like an ambience control on samples recorded with reverb.
Once a suitable envelope has been created, for added realism try making the attack and release times slightly longer, progressively so if you have the patience. You see, not only does it take longer physically and acoustically for lower pitches to be produced and to reach our ears, they also die away at a slightly slower rate to higher pitches.
IN THE MARCH issue of MT, we touched upon the subject of vocal formants. To recap; vocal formants are accentuated audio frequency bands that tell our brains what vowel sound we're listening to, what size person is singing and what register is being sung in. No matter which pitch is sung, these accentuated bands stay the same for a given vowel sound. No ifs, ands or buts.
There are two main formants for each vowel sound. For example, the formants for an average adult male:
Oo (as in spook) - 400 and 800Hz
Aah (father) - 825 and 1200Hz
Ee (feet) - 375 and 2400Hz
Womens formants are generally 17% higher, children 25% higher.
To hear vocal formants at work and to demonstrate that they stay fixed for a given vowel, try this little experiment: shape your mouth and vocal tract as if you were to utter a vowel (say, Aah) and whisper this vowel rather than producing a pitch. The effect should be kind of like white noise with the distinct characteristics of the vowel placed upon it. You'll find that there is very little leeway in the placement of your tongue and the shaping of your vocal tract before the unsung vowel starts resembling another vowel.
There comes a point when singing vowels in the extreme upper ranges, especially with females, that these vowels start losing their identity. This is due to the fundamental of the note being higher than the lower formants. Sampling the vowel whispered, with no sign of pitch present, and mixing it in with these can help retain its identity. Whispered vowels can also be mixed with all the members of a multisample to add a breathier quality to them.
CERTAIN SONGS HAVE succeeded in sending shivers up my spine using vocal sounds that would be impossible without sampling technology. While some of us go to great pains to ensure that samples sound as natural as possible, in many instances vocal samples played well out of their range sound fantastic; for example, when playing soaring lead lines in songs that ten years ago would have been delegated to the electric guitar. This is healthy not only for its importance in extending the palette of tone colours heard in today's music, it also shows the Doubting Thomases of the music world that sampling can be a lot more creative than merely putting session musicians in the dole queue. "Rules" are meant only as guidelines, so that we don't waste a lot of time making the same mistakes as our predecessors did to achieve acceptable results. Once the reasons for their usage are understood you should choose to use them or not, as your creativity and better judgement dictate.
For the whackos among us, here are some silly things to experiment with for the strange coloration they impart to a voice:
- Sing down a cardboard or plastic tube; it renders speech more sibilant and creates a static flanging effect.
- Sing with your lungs full of helium; it raises the pitch of your voice, and to my knowledge, used in moderation is perfectly harmless (although you may find yourself re-enacting scenes from the Wizard of Oz).
- Sample only the reverbed or effected signal.
- Hum into a balloon, kazoo or piece of cellophane.
- Mix humming and whistling the same pitch at equal levels. This creates a most unsettling, unearthly effect, perfect for film scores.
And while you're at it, don't rule out things like sending the voice through a distortion, octave-divider or other pedal effect, cardboard megaphone, electric bullhorn, telephone, Leslie cabinet if you're looking for new vocal colorations to experiment with. Try a contact mic attached to drums, cymbals, guitar, piano (a brick holding down the sustain pedal), cardboard box, biscuit tin... Sing in close proximity to these for added resonance.
AS YOU'RE PROBABLY aware, many tracks have used vocal samples to imitate percussion and bass sounds. It shouldn't tax your imagination too heavily to realise that with sampling, a complete band or orchestra can be assembled with little more raw materials than vocal samples - either neat or heavily effected. With few exceptions, almost any vocal sound can be sussed to centre around a frequency or frequency band using the tuning and/or looping provisions on your sampler, and totally re-shaped with a voltage or digitally-controlled amplifier and low-pass filter to resemble just about any existing or imaginary instrument or synthesiser effect. Such is the magic of sampling.
Entire film scores have been recorded using only a lowly Minimoog and a multitrack tape recorder, emulating virtually every instrument of the concert orchestra with a handful of audio waveforms and a couple of envelopes. Sampling allows us to work with considerably more than the four or five basic waveforms that analogue synthesisers have limited us to in the past.
When using sampled voices to imitate other instruments, your raw sound material can be tailored at source by mimicking the instrument/effect you want to hear. Treating your vocal approximation with old VCF and VCA envelopes will go a long way in furthering the illusion. (Well worth brushing up on your analogue synthesis theory if you're not thoroughly familiar with all the different qualities that can be imposed upon a signal, by altering the manner in which we hear its loudness and brightness.)
General Attack Characteristics of Instrument Families
|Immediate||Percussion, plucked/struck strings.|
|Moderate||Brass, wind, bowed string, vocals.|
|Slow||Wind, bowed strings, vocals.|
|In tune||Keyboard, wind, percussion.|
|Flat||Vocals, bowed strings, brass.|
|Sharp||Percussion, plucked/struck string, vocals, brass.|
Feature by Tom McLaughlin
mu:zines is the result of thousands of hours of effort, and will require many thousands more going forward to reach our goals of getting all this content online.
If you value this resource, you can support this project - it really helps!