Home -> Magazines -> Issues -> Articles in this issue -> View
Morpheus (Part 1) | |
Introducing a new direction in synthesis from EmuArticle from Sound On Sound, October 1993 |
E-mu's new module heralds a new direction in synthesis. Will it change the face of music?
Emu Systems are about to release an innovative new synthesizer module — the Morpheus. At its heart is an intriguing method of synthesizing sophisticated filters which allow you to make complex changes to sounds. Martin Russ takes a look at the synthesis technique this month, and its practical realisation in the next part of this series.
Big-budget movies and TV adverts are full of them — objects smoothly changing from one thing to another: horses to cars; one famous face to another; and much more. This is achieved by using powerful computers to calculate the intermediate stages between two different pictures, and has become known as Morphing.
In audio terms, changing one thing into another is more familiar than you might think. Cross-fades are just one example of audio processing that can be thought of as the sound equivalent of a very simple morphing process. Cross-fades are one of the standard tools of the synthesist, especially in these days of wall-to-wall Sample + Synthesis (S+S) instruments. You take two sounds, give each a separate envelope, and voila: a sound which fades back and forth between the two sounds, hopefully producing a single composite sound. This is typically used to combine the attack and sustain from two instruments into a more interesting 'hybrid' sound. It also shows the major elements of any morphing technique: Fixed Points and Interpolation.
Fixed Points are the reference points of the morphing process. A point is defined as something with no size — infinitely small. A picture or an audio sample is a snapshot of something for just one instant and thus frozen in time, the most obvious fixed points being the beginning and end of a sound — usually silence! Less obvious fixed points can be parts within a sound — just after its attack portion, for example, or two points during the sustain part of a sound which are similar enough to be used as loop points.
Interpolation is the process by which you calculate the in-between values according to certain rules; it produces the smooth transition between those frozen moments in time. For pictures, the rules can be complex, because certain things will be required to stay fixed or move in a specific way. So to change a face, you are likely to want the eyes, nose and mouth to move about quite slowly, if at all. In contrast, the hair can zip about all over the place — like changing someone who is bald into someone with hair. With sounds, the rules are often simpler — for example, a cross-fade is usually defined so that the volume stays more or less constant. If the volume doesn't stay constant, then the 'dip' can be intrusive and destroy the effect of the cross-fade. The catch is that with a cross-fade, you only get a mix of the two sounds; with a picture that is morphed, you get elements from both pictures in a new 'in-between' picture or series of pictures.
Working out what the changes between the fixed points should be is something that computers are very good at — it is choosing the fixed points that requires human skills. Cartooning is an excellent analogy here: the cartoonist draws only the important stages in a character's movement, whilst a 'tweener' produces the in-between drawings that give the final, smooth animation. These days, 'tweeners' can be humans or computers — and the computers don't complain when they are asked to do thousands of complicated pictures, which is why they tend to be used for backgrounds and swirling effects.
You can see quite a few examples of this sort of thing in Disney's excellent Beauty and the Beast cartoon. Some of the computer 'assisted' cartooning is obvious — like the Ballroom scene, where the two main characters are cartooned on top of a superbly rendered, computer generated ballroom, complete with huge chandelier. The viewpoint of the audience swirls around the dancing couple in much the same way that a SteadyCam might do if this was a real movie — but this is a cartoon, so the rapid and continuous changes of perspective are handled by the computer.
Just as pictures can combine elements from more than two sources, so morphed sounds are not limited to just a couple of component parts. Many of the best S+S sounds are made up of layers of several sounds, which blend together to make a 'finished' sounding final product. This mixing process is normally done with envelope generators, which do an 'automated mix' between the basic sounds. This is the basis of Vector Synthesis, and it has the advantage that it lets you quickly interact with the sound as it is playing. All these techniques have the disadvantage that they really just move from one sound to another — one sound fades out as another fades in — but the underlying sounds stay static as this happens. Video morphing actually changes the pictures between the fixed points, so that the resulting image incorporates bits of each picture. Doing this effectively with sounds means that you need to do much more than just change their volume.
Changing sounds is not something that most S+S instrument manufacturers want you to think too hard about. The very distinctiveness of the PCM samples that they provide in on-board ROM is also the weak spot, because no amount of mixing with other sounds can effectively disguise them once you know what they sound like. What's needed is a way of pulling the sounds apart and then putting them back together — as in picture morphing. The problem is that the tools offered by most S+S synthesizers just aren't up to the task, and solving the problem involves some fascinating concepts.
What makes a sound special? How is it that you can categorise sounds into groups like brassy, metallic, silky, spiky, dark, resonant and so on — and why would most people agree with you? Contrary to what computer-based sample editors might make you think, the low level 'shape' of a sound is not what really matters — nor is the frequency content, also known as the spectrum. What matters is how the sound changes with time — how the spectrum evolves. Brassy sounds have a complex set of harmonics which build up at the start of the note, and die after the initial burst of 'air' — something which just happens to be emulated very nicely by a harmonically rich waveform like sawtooth, passed through a voltage controlled filter whose cutoff frequency is swept by an envelope generator. There are those critics of analogue synthesizers who say that 'all they produce are lots of variations on synth-brass sounds!' In terms of pictures, I could show you lots of shapes and photos that could all be labelled with the word 'smile'. But if the same shapes or photos are animated, then only a particular set of movements will suggest a smile. If the changes to the face don't match with the underlying model that you have of a smile, then it can be interpreted as a 'wry smile' or a 'smirk' or 'disbelief', or half a dozen other facial expressions. Sounds work in much the same way — you know what a 'tinkly' sound 'sounds like', and the quality of 'tinkliness' is very distinctive.
We have a paradox here: useful, realistic sample sounds have a 'fingerprint' that makes them useful to suggest a specific type of instrument or mood, but part of that distinctiveness lies in the way in which the sound changes. Since we already have the samples with their inherent 'character', we need to try and 'filter' out just some parts of the sound in a dynamic way — like turning a smirk into a smile! The sort of filtering facilities that you get in most S+S instruments really aren't up to this, since they usually only have low-pass filters whose cutoff can be changed using an envelope. This immediately restricts the possible changes to ones where high harmonics can be added or removed, and always in a predictable order as you change the cutoff frequency. Not only that, but the resonance (if any) of the filter is usually not controlled by an envelope, and so is static, which also fixes the 'sound' of the filter. Voltage controlled filters are far from perfect in this respect!
"The low level 'shape' of a sound is not what really matters — nor is the frequency content, also known as the spectrum. What matters is how the sound changes with time — how the spectrum evolves."
Since we have a sample whose spectrum is capable of continuous dynamic change, and whose 'character' is set by this evolution of harmonics, it follows that in order to make worthwhile changes to the sample, we need a filtering system which is capable of the same degree of complexity. This has been a tall order until very recently; you could try and do the job with lots of simple filters, but the problems involved in controlling lots of individual filters are enormous. If you take this approach to its simplest form, you have one filter for each individual harmonic in the sample, which actually boils down to Additive Synthesis, since you end up mixing together lots of sine waves. If trying to define envelopes for one filter is hard, imagine having to cope with 20 or more!
One solution to this problem applies the morphing concept again. Since it works well with pictures and with mixing sounds, why can't you do it with filters? In fact, you can — by defining filters with specific characteristics as the fixed points, and then interpolating between them, which means that you can have complex changes of filter shape without the need for over-complicated controls. Let's take a simple example. Imagine that we want to change from a low-pass filter to a high-pass filter. As we morph from the low-pass towards the high-pass, the higher frequencies will gradually appear and the low frequencies will disappear, until we end up with just the high frequencies. Such a transformation would be quite difficult to program on most analogue or digital synthesizers, but we are not restricted to such simple two-dimensional changes.
Suppose we had a third filter which was a notch that cut out just a band of frequencies, and that we could interpolate any filter we like from these three. Imagine we start out by interpolating a starting filter from just two of them. One approach might be to use velocity to move towards one filter, and use the keyboard note number to move towards the other. We can then use an envelope to morph the filter to change into the third filter. So we might start out using a low-pass filter with a little bit of notch and morph to a high-pass filter and so fade out the low frequencies as the sound ends. Any other combination of the filters would be possible, and we haven't even considered changing the cutoff frequencies! Two or three filters combined with the ability to morph from one to another seems like a very powerful tool which ought to enable us to track the harmonic changes in complex evolving samples.
By moving the morphing from the raw sounds into the processing section of the synthesizer, Emu have really produced a new form of synthesis, where the complexity of the sound source is matched by the subsequent modification stage. This is how simple analogue synthesizers work — with simple waveshapes and a basic filter to modify the harmonic content. S+S instruments have a 'mismatch', where the samples are much too complex for the filtering section to process effectively in any way that produces useful results beyond basic filter sweeps and slight tonal changes. By having a filtering system which can change in ways that can follow the evolution of harmonics in a sample, the result is a new and very powerful way of making sounds.
Morphing between pictures on the television is one thing. Mixing between two sounds is believable. But filters that change from one type to another, in real-time, and in a controllable way can be difficult thing to come to terms with! Once armed with such a synthesis tool, the possibilities are enormous. By analysing how sounds change, it should be possible to get some pointers to the important features that give real-world samples their unique characters. By producing filters that enable us to reproduce the same type of changes of harmonic structure, we should be able to use those filters to process sounds and 'impose' that character onto them. This would let you morph one instrument into another, where the intermediate steps are not just a difference in volume between two sounds, but a whole set of new instruments which gradually blend from one to the other.
At this point, an interesting word comes to mind: Resynthesis. This has long been the equivalent of 'the holy grail' for sound synthesis. It involves analysing a sound to establish its basic parameters, then changing those parameter values, and reproducing a new modified sound. The stumbling block has always been the complexity and number of parameters needed to define a sound — and then trying to control them all! But by using just a few filters, and then interpolating between them, it should be possible to get very complicated filtering with only a few parameters. Unfortunately, this only solves one problem: the other problem of resynthesis has always been how to extract the parameters in the first place. Sophisticated filters may be a step nearer to resynthesis, but morphing is a much better description for what should be possible — the ability to make samples take on some of the characteristics of other samples.
The filters used in early analogue synthesizers were low-pass, but with a Q, or resonance, control that turned them into a peaky version which was good at emphasising particular frequencies. The resonance was usually left fixed whilst the cutoff frequency was changed with a control voltage. Digital Signal Processing chips (DSPs) now enable filters to be programmed entirely digitally — and until now, most of these filters have been designed to sound like the resonant low-pass filters of the past. Emu's Vintage Keys emulates the 2- and 4-pole filters of analogue synthesizers for just this reason. The difference is that digital filters allow you to have many more poles, which means that there are many more options for the type of filter characteristic. Low-pass, high-pass, notch and band-pass are all possible with a few poles, but using 10 or more makes the possibilities a lot more interesting.
I have so far mentioned only three filters, and if you imagine these as being on the three axes (x, y and z), you might be able to imagine moving around between the three filters, interpolating a new filter type depending on your position. By adding more filters and placing them at the corners of a cube, we get a three dimensional cube where we can choose between any of the eight filters and change between them just by moving along the same three axes (x, y and z). Such a filter offers such complex control over sounds passing through it that designing sets of eight filters (or 'cubes') may well be left to experts; most users may well only select a cube and then fiddle with its parameters, in much the same way as ROM samples are selected and then tweaked in S+S synthesizers. Cubes will probably emulate types of filter, types of sound, or their effect on a sound: so a resonant low-pass filter is an obvious cube, whilst many variants of brassy, or thin, or phased/flanged sounds (and more) are possible. The possibilities are huge, and finally provide the ability to easily manipulate the spectrum instead of the raw data of samples.
This effectively gives us a 'third generation' synthesizer. First generation ones were analogue, and had simple waveshapes modified by low-pass filters. Second generation synths have complex waveshapes in the form of samples, but still have low-pass filtering. The third generation of synthesizer combines samples and filter cubes. Cubes offer a quick and easy way to completely change from one filter type to another — much as you now select samples. The difference lies in the depth and detail of control that you now have over the sound — this technique really does allow you to pull sounds apart and then put them back together again in a different way, and the changes can happen in real-time!
In the final part of this two-part series, I will be taking an advance look at a forthcoming piece of real-world hardware that incorporates just such a 'morphing' filter cube, the Emu Morpheus, which is due to be released later this year.
Figure 1. The cube has eight filters at the comer of a three-axis cube. In a fully flexible system you could start anywhere and move around using velocity, wheels, keyboard tracking, etc, to change position. In practice, starting on one face and moving/morphing back and forth inside the cube using one or two controllers is probably much easier to implement in real-world hardware. Emu's Morpheus has the face-to-face movement, but also allows movement from one whole cube to another — so instead of two parameters for setting the starting filter, you can use three!
Figure 2. Low-Pass to High-Pass. The interpolated filter response goes flat and then high-pass. The sound changes from dark to thin as the filters morph from one to the other. You could then morph back to low-pass — with an LFO perhaps, or an envelope.
Figure 3. Low Pass to Band-pass. The filter response goes peaky (as if the Q was being increased) and sounds more and more resonant as the lower frequencies disappear.
Figure 4. Multiple Notches to More Notches. Notches in the frequency response produce 'phasing/flanging' type effects when you move them, but by morphing from one set to another with different spacing, the quality of the phasing/flanging will change dynamically.
Figure 5. Band-pass to Band-pass. Two band-pass filters with different shapes enable individual portions of the frequency spectrum to be controlled separately. This will sound like a radical change in the tone or timbre.
Figure 6. If the diagrams in this article look unfamiliar, think of them as the shape made by the sliders on a graphic equaliser. A low pass filter lets only low frequencies through, and so the higher sliders on the right hand side will be down, whilst the rest will be up. Most sample editing tends to concentrate on the shape of the sample waveform, but the really useful information is in the sound's harmonics — and the best way to see what is happening to harmonics is to use a diagram where the across axis shows frequency, low on the left, and high on the right. Volume or level is shown by the up and down height.
Read the next part in this series:
Morphology (Part 2)
(SOS Dec 93)
All parts in this series:
Part 1 (Viewing) | Part 2
E-mu Morpheus - Z-plane synthesiser
(MT Nov 93)
E-mu Morpheus - Z-plane synthesiser
(MT Jan 94)
Browse category: Synthesizer Module > Emu Systems
Series:
Part 1 (Viewing) | Part 2
Gear in this article:
Synthesizer Module > Emu Systems > Morpheus
Gear Tags:
Review by Martin Russ
Previous article in this issue:
Next article in this issue:
mu:zines is the result of thousands of hours of effort, and will require many thousands more going forward to reach our goals of getting all this content online.
If you value this resource, you can support this project - it really helps!
New issues that have been donated or scanned for us this month.
All donations and support are gratefully appreciated - thank you.
Do you have any of these magazine issues?
If so, and you can donate, lend or scan them to help complete our archive, please get in touch via the Contribute page - thanks!