Thursday, 15 December 2011
Demonstrating aliasing.
The importance of avoiding aliasing is best demonstrated with an example:
Aliased blitsong by frostburn
Sounds weird doesn't it. I rendered the song with a low sampling rate (8000 Hz) to accentuate the effect. Now let's try the same song with the same sampling rate but this time without the offending frequencies.
Non aliased blitsong by frostburn
Much better. Sure it's missing in brightness due to the low sampling rate but the "metallic", broken sound is gone and we're left with only pure tones that can be faithfully reproduced. Here's the same piece once more but now with CD quality audio and a different instrument.
Blitsong with saw1 by frostburn
The sound is quite thin but that's plain bandlimited oscillators to you. They'll get more body once we pass them through a few filters in a future blog post about subtractive synthesis.
Status of the Python Library
As can be heard from the above sound clips the sound synthesis library exists and can render output. I just don't feel that it's ready for a release yet. I want to make sure that the approach I'm taking works and suits my (overambitious) needs. Besides, there are already plenty of audio synthesis libraries out there. I'm creating my own just for the challenge of it.
The general idea is to use the generators in Python to, well, generate sound. They are reasonably fast but not fast enough for serious synth work. Right now I'm looking for a way to speed things up. Processing the audio streams in chunks using numpy looks promising but such an approach precludes certain feedback effects so it wont work for everything.
With the library in limbo I'll have to make these blog posts and the accompanying audio clips without referring to the actual implementations.
Wednesday, 7 December 2011
The math of bandlimited oscillators
The previous post looked at the idealized impulse train worked our way down to a useful digital implementation. Let's try it the other way around. We'll start with the discrete impulse train that has one sample with amplitude 1 and (n-1) samples with amplitude 0.
[1, 0, 0, 0, ... , 0], repeated ad infinitum.
The frequency content can be calculated by taking the discrete Fourier transform. It is trivial because each of the partials we're summing over in the transformation starts with 1 + 0i regardless of frequency.
fft([1, 0, ... , 0]) = [1, 1, ... , 1].
So the discrete impulse train contains all the complex frequencies in equal proportion.
( ... + exp(-i x) + exp(0) + exp(i x) + exp(i 2x) + ... )/n
Notice how every complex exponential has a pair except for exp(0). If n is even the Nyquist frequency will also be missing a pair. So writing the series in terms of real functions we have:
( 1 + 2 cos(x) + 2 cos(2x) + ... + 2 cos((n/2-1)x) + cos((n/2)x) )/n, n even
( 1 + 2 cos(x) + 2 cos(2x) + ... + 2 cos(((n-1)/2)x) )/n, n odd.
sin(π t)/(π t) =: sinc(t)
as simple and irrigorous as that. For the impulse train we simply divide out every n:th zero with sin(π t/n) to get:
sin(π t)/(n sin(π t/n)) =: windowedsinc(t,n), n odd.
The catch is that this trickery only works when n is odd. But this is the function we'll want to use because the odd case had a more uniform spectrum without any nyquist nuisances. For reference the even case simplifies to:
sin(π t)cot(π t/(2n))/n.
It has a double analytical zero between the impulses. Not something you can easily just deduce by matching zeros with samples. But what happens when you try windowedsinc with an even n?
Oh, so it flips every other impulse around. Well that's cool. We get the difference of two impulse trains 180° apart for free in terms of computational effort. This has the effect of killing every even harmonic leaving us something that sounds like a thin square wave. Actually it is the derivative of the bandlimited square wave, but more on that later.
The computational gain is also marginal because they can always be reduced to a sum of the regular blits.
In the previous post we used the geometric series. Let's look at it in more detail:
1 + z + z2 + ... + z(n-1) = (1-zn)/(1-z).
Let's plug in e2πit for z and see what happens.
I like the imaginary part better. It's more symmetric. Anyway let's replace z with az, where a is a real number say 0.75.
Hey that imaginary part sort of looks like the saw wave! Well it sounds like it too but much softer. The harmonics fall off more quickly like an instead of 1/n like in the saw wave. To get something resembling the square wave we'd need to eliminate the even powers of z. However when dealing with complex harmonics there is an easier way. We double the frequency by replacing az with (az)2 and then multiply with az to shift the powers to the odd indices.
Again the visual similarity to the square wave is apparent in the imaginary part. The bandlimited gifts of the geometric series do not end here. We can differentiate the series to get:
1 + 2z + 3z2 + ... + (n-1)z(n-2) = (z + (z(n-1)-n)zn)/(z(1-z)2).
A bit messier than before but still much better than actually carrying out the summation on the left hand side. Again plugging in az for z gives a way to control the softness of the sound and for a<1 the spectrum fill have a hump that can be used in combinations with other oscillators to give more color to the sound.
The final trick that can be used to get more sounds out of the geometric series is multiplying it by eiωt to shift the harmonic frequencies a non-integer amount up or down to get strange aperiodic sounds.
Unfortunately the geometric series cannot be integrated to get cheap formulas for the bandlimited saw and square wave. The integral is:
z + z2/2 + z3/3 + ... + zn/n = -Beta(z; 1+n, 0) - log(z-1),
Where Beta is the incomplete Beta-function. And the right hand side is not defined when z=1. The calculation of Beta is considerably more involved than simple multiplication and division but with the right libraries it might still be computationally cheaper than carrying out the sum naively.
Monday, 5 December 2011
All aboard the bandlimited impulse train!
Sound synthesis used to be a little hobby of mine. A while ago I dug up some of my old algorithms and decided to give this passion a new go. I never finished my grand plans of creating the perfect sound synthesis library and I didn't properly release the stuff that I had worked on. I've since realized that perfection is the enemy of completion.
Learning from past frustrations I've decided to dig up the lost audio gems and make little blog posts about them explaining the motivation and inner workings of the digital noise makers. I will divide the posts in three sections. For the non-technical reader I'll provide The Meat - a down to earth description of the algorithm explaining the motivation behind it. For the math enthusiast we'll have The Math - a glimpse into the platonic structure of synthesized sound. And finally for our programming friends I'll prepared The Method - a working implementation of the digital buzzer in question.
The natural starting point for a sound synthesis is the humble sample also called an impulse. A single impulse sounds like a simple click or a snap. Put enough of them in a row and you will have a sound. The simplest way to do it is simply to repeat an impulse with regular intervals creating an impulse train. The audible result is a very thin sounding buzz. As we're aiming for a digital implementation of the impulse train we'll need to work around a few issues that come with sampling platonic objects with finite sampling rate like 44100 Hz in the case of CD audio.
The Meat
Just by itself the impulse train may sound thin and annoying but it is a great starting point for subtractive synthesis. Creating sound by filtering it requires a rich starting waveform. The impulse train has equal amounts of all harmonics that exist in a periodic sound and is thus a model specimen to be sculpted by removing parts of those harmonics.
There isn't much to the impulse train if you can work with integer intervals output 1 once, 0 for a few times, repeat and you'll have your sound wave. The problem is that you can only reach a limited number of frequencies with this approach. Working with 44100 Hz sampling rate a stream of 101010... would be a 22025 Hz buzz and a stream consisting of 100100100... would buzz with a frequency of 14700 Hz. In general the impulse train with frequency f = 44100/n will be:
1, i mod n = 0
0, otherwise
where i ranges over the samples indexed by integers. (The technical readers have correctly pointed out that this function will not produce the "correct" impulse train, but it's almost right and it isn't aliased anyway, that is, it's a bandlimited oscillator without extra partials in the wrong places). However there is no way make a buzz of say 10000 Hz this way because it doesn't divide the sampling rate exactly. How can you put 3.41 zeros between the ones? There is no way to do such a thing. You could try putting 3 zeros sometimes and occasionally 4 zeros to get the average number of zeros between the ones to approximately 3.41 in the long run but the result sounds either just awful or very noisy depending on the way you do it. More finesse is required. There is a way to make a waveform that gives the clean stream of ones and zeros for integer periods (ok the math section formulas don't exactly produce the ones and zeros version but it's "wrong" anyway) and still works without noise or aliasing artifacts with fractional periods. To find out what this magical waveform is we need to use math.
The Math
Now that's pretty neat and easy to calculate. There's one little thing though. The expression is not defined when x=0 but using basic analysis we can see that the limit value of the expression is 0 when x approaches 0. It's also straightforward to see that the limit of the derivative at x=0 is n(n+1)/2.
The Method
So using a little bit of math we ended up with ( sin(nx/2) sin( (1+n)x/2 ) )/sin(x/2) for our bandlimited impulse train. It's not the same as the impulse train we started with but it sounds the same and has a few bonus properties that makes it more suitable for sound synthesis, namely no DC component and zero initial amplitude. So now we need to turn the formula in to a usable function that we can call with our digital synthesizer. The first problem we face is division by zero. It can be solved by approximating the function with a power series at the troublesome points. Sparing the details we can see that x n(n+1)/2 is a good approximation to our function near x=0. We can actually do better and take care of the periodicity too by using sin(x/2)n(n+1) when sin(x/2) is near zero. Now we can define a little python function:
Now we face another problem. The number of harmonics n is supposed to be an integer. What if I want to ramp the frequency of up and down. Do the harmonics just pop in and out when the base frequency allows for more terms in the series? That is one solution but the resulting clicking sounds terrible. It's better to smoothly fade out a harmonic as we approach the sampling limit. Finally there is normalization. Summing up a lot of sinusoids can get pretty loud pretty quick so let's just divide the whole thing by the number of harmonics. Adding these features we get:
Finally we'd like to make a something that outputs a bandlimited impulse train when we feed it a stream of frequency values. Remember that we can have up to (sampling rate)/(2f) harmonics for a blit with a base frequency of f. The fractional part of the previous expression will take care of fading the harmonics in and out as needed.
That should get you buzzing! In the next post we'll look more closely at the blit and other bandlimited waveforms like it.