Sampled Harmonies: 2011

Thursday, 15 December 2011

Demonstrating aliasing.

In the last post we delved in to the math of the non-aliasing oscillators. Now it's time to make some noise with them.

The importance of avoiding aliasing is best demonstrated with an example:

Aliased blitsong by frostburn

Sounds weird doesn't it. I rendered the song with a low sampling rate (8000 Hz) to accentuate the effect. Now let's try the same song with the same sampling rate but this time without the offending frequencies.

Non aliased blitsong by frostburn

Much better. Sure it's missing in brightness due to the low sampling rate but the "metallic", broken sound is gone and we're left with only pure tones that can be faithfully reproduced. Here's the same piece once more but now with CD quality audio and a different instrument.

Blitsong with saw1 by frostburn
The sound is quite thin but that's plain bandlimited oscillators to you. They'll get more body once we pass them through a few filters in a future blog post about subtractive synthesis.

Status of the Python Library
As can be heard from the above sound clips the sound synthesis library exists and can render output. I just don't feel that it's ready for a release yet. I want to make sure that the approach I'm taking works and suits my (overambitious) needs. Besides, there are already plenty of audio synthesis libraries out there. I'm creating my own just for the challenge of it.
The general idea is to use the generators in Python to, well, generate sound. They are reasonably fast but not fast enough for serious synth work. Right now I'm looking for a way to speed things up. Processing the audio streams in chunks using numpy looks promising but such an approach precludes certain feedback effects so it wont work for everything.
With the library in limbo I'll have to make these blog posts and the accompanying audio clips without referring to the actual implementations.

Wednesday, 7 December 2011

The math of bandlimited oscillators

In the previous post we constructed a bandlimited (non-aliasing) oscillator modeled after the impulse train. In this post we'll look at more aliasing free oscillators from the mathematical point of view.

The previous post looked at the idealized impulse train worked our way down to a useful digital implementation. Let's try it the other way around. We'll start with the discrete impulse train that has one sample with amplitude 1 and (n-1) samples with amplitude 0.
[1, 0, 0, 0, ... , 0], repeated ad infinitum.
The frequency content can be calculated by taking the discrete Fourier transform. It is trivial because each of the partials we're summing over in the transformation starts with 1 + 0i regardless of frequency.
fft([1, 0, ... , 0]) = [1, 1, ... , 1].
So the discrete impulse train contains all the complex frequencies in equal proportion.
( ... + exp(-i x) + exp(0) + exp(i x) + exp(i 2x) + ... )/n
Notice how every complex exponential has a pair except for exp(0). If n is even the Nyquist frequency will also be missing a pair. So writing the series in terms of real functions we have:
( 1 + 2 cos(x) + 2 cos(2x) + ... + 2 cos((n/2-1)x) + cos((n/2)x) )/n, n even
( 1 + 2 cos(x) + 2 cos(2x) + ... + 2 cos(((n-1)/2)x) )/n, n odd.

There is a little trick to get simple expressions for the continuous counterparts of both the single impulse [..., 0, 0, 1, 0, 0, ....] and the impulse train. The unique non-zero analytic function that has zeros exactly at the sampling points giving [..., 0, 0, 0, ....] is sin(π t). To get the continuous version of the single impulse we just divide out one of the zeros to get
sin(π t)/(π t) =: sinc(t)

as simple and irrigorous as that. For the impulse train we simply divide out every n:th zero with sin(π t/n) to get:
sin(π t)/(n sin(π t/n)) =: windowedsinc(t,n), n odd.
The catch is that this trickery only works when n is odd. But this is the function we'll want to use because the odd case had a more uniform spectrum without any nyquist nuisances. For reference the even case simplifies to:
sin(π t)cot(π t/(2n))/n.
It has a double analytical zero between the impulses. Not something you can easily just deduce by matching zeros with samples. But what happens when you try windowedsinc with an even n?

Oh, so it flips every other impulse around. Well that's cool. We get the difference of two impulse trains 180° apart for free in terms of computational effort. This has the effect of killing every even harmonic leaving us something that sounds like a thin square wave. Actually it is the derivative of the bandlimited square wave, but more on that later.

You can also make bandlimited oscillators by dividing out more zeros per (semi-)period like getting rid of the zeros at indices 0, 1 and 5 mod 10, but they may be more trouble than they're worth. The point of these formulas is to make it easy to calculate a known sum of harmonics without actually doing the summation. Getting the harmonic content of these beasts is somewhat involved.

The computational gain is also marginal because they can always be reduced to a sum of the regular blits.
In the previous post we used the geometric series. Let's look at it in more detail:
1 + z + z² + ... + z^(n-1) = (1-zⁿ)/(1-z).
Let's plug in e^2πit for z and see what happens.

I like the imaginary part better. It's more symmetric. Anyway let's replace z with az, where a is a real number say 0.75.

Hey that imaginary part sort of looks like the saw wave! Well it sounds like it too but much softer. The harmonics fall off more quickly like aⁿ instead of 1/n like in the saw wave. To get something resembling the square wave we'd need to eliminate the even powers of z. However when dealing with complex harmonics there is an easier way. We double the frequency by replacing az with (az)² and then multiply with az to shift the powers to the odd indices.

Again the visual similarity to the square wave is apparent in the imaginary part. The bandlimited gifts of the geometric series do not end here. We can differentiate the series to get:
1 + 2z + 3z² + ... + (n-1)z^(n-2) = (z + (z(n-1)-n)zⁿ)/(z(1-z)²).
A bit messier than before but still much better than actually carrying out the summation on the left hand side. Again plugging in az for z gives a way to control the softness of the sound and for a<1 the spectrum fill have a hump that can be used in combinations with other oscillators to give more color to the sound.

The final trick that can be used to get more sounds out of the geometric series is multiplying it by e^iωt to shift the harmonic frequencies a non-integer amount up or down to get strange aperiodic sounds.
Unfortunately the geometric series cannot be integrated to get cheap formulas for the bandlimited saw and square wave. The integral is:
z + z²/2 + z³/3 + ... + zⁿ/n = -Beta(z; 1+n, 0) - log(z-1),
Where Beta is the incomplete Beta-function. And the right hand side is not defined when z=1. The calculation of Beta is considerably more involved than simple multiplication and division but with the right libraries it might still be computationally cheaper than carrying out the sum naively.

Monday, 5 December 2011

All aboard the bandlimited impulse train!

Sound synthesis used to be a little hobby of mine. A while ago I dug up some of my old algorithms and decided to give this passion a new go. I never finished my grand plans of creating the perfect sound synthesis library and I didn't properly release the stuff that I had worked on. I've since realized that perfection is the enemy of completion.

Learning from past frustrations I've decided to dig up the lost audio gems and make little blog posts about them explaining the motivation and inner workings of the digital noise makers. I will divide the posts in three sections. For the non-technical reader I'll provide The Meat - a down to earth description of the algorithm explaining the motivation behind it. For the math enthusiast we'll have The Math - a glimpse into the platonic structure of synthesized sound. And finally for our programming friends I'll prepared The Method - a working implementation of the digital buzzer in question.

The natural starting point for a sound synthesis is the humble sample also called an impulse. A single impulse sounds like a simple click or a snap. Put enough of them in a row and you will have a sound. The simplest way to do it is simply to repeat an impulse with regular intervals creating an impulse train. The audible result is a very thin sounding buzz. As we're aiming for a digital implementation of the impulse train we'll need to work around a few issues that come with sampling platonic objects with finite sampling rate like 44100 Hz in the case of CD audio.

The Meat
Just by itself the impulse train may sound thin and annoying but it is a great starting point for subtractive synthesis. Creating sound by filtering it requires a rich starting waveform. The impulse train has equal amounts of all harmonics that exist in a periodic sound and is thus a model specimen to be sculpted by removing parts of those harmonics.

There isn't much to the impulse train if you can work with integer intervals output 1 once, 0 for a few times, repeat and you'll have your sound wave. The problem is that you can only reach a limited number of frequencies with this approach. Working with 44100 Hz sampling rate a stream of 101010... would be a 22025 Hz buzz and a stream consisting of 100100100... would buzz with a frequency of 14700 Hz. In general the impulse train with frequency f = 44100/n will be:
1, i mod n = 0
0, otherwise
where i ranges over the samples indexed by integers. (The technical readers have correctly pointed out that this function will not produce the "correct" impulse train, but it's almost right and it isn't aliased anyway, that is, it's a bandlimited oscillator without extra partials in the wrong places). However there is no way make a buzz of say 10000 Hz this way because it doesn't divide the sampling rate exactly. How can you put 3.41 zeros between the ones? There is no way to do such a thing. You could try putting 3 zeros sometimes and occasionally 4 zeros to get the average number of zeros between the ones to approximately 3.41 in the long run but the result sounds either just awful or very noisy depending on the way you do it. More finesse is required. There is a way to make a waveform that gives the clean stream of ones and zeros for integer periods (ok the math section formulas don't exactly produce the ones and zeros version but it's "wrong" anyway) and still works without noise or aliasing artifacts with fractional periods. To find out what this magical waveform is we need to use math.

The Math

In the platonic realm of perfect fidelity the impulse is not a sample of some stream of numbers or even a real function. The unit impulse is actually the Dirac delta function. An infinitely sharp spike is truly the grandfather of all clicks. The impulse train then is of course the Dirac comb. The Fourier series of this beast with period T is (1 + 2cos(x) + 2cos(2x) + 2cos(3x) + ...)/T , where x = 2 pi t / T. This means that the impulse train contains all the integer multiples of it's base frequency in equal weights. The sound is perfectly uniform spectrally. That in turn means that by modifying the weights of the cosines, also known as harmonics, with filters we can in principle create every sound that has a period of T. Now how do you sample a function that's not even a function to get from the real line to the digital domain of samples so you can play the thing through your speakers? Precisely speaking you don't. Your speakers really don't do infinity and even if they did you must sample precisely at the multiples of T to get something that isn't a constant zero.

The solution is to remember that sinusoids like cos(x) can only be faithfully sampled when their frequency is less than half the sampling rate (also known as the nyquist frequency. It all really just boils down to the sampling theorem). If you sample a high frequency sinusoid you still get a sinusoid but with the wrong frequency because of aliasing. So to sample our idealized impulse train we simply take it's Fourier series and terminate it before it reaches the sampling barrier. This procedure is know as bandlimiting. We also eliminate the nasty infinities while we're at it. To make matters simpler we'll drop the common factor of 2/T.

Se we end up with 1/2 + cos(x) + cos(2x) + ... + cos( (n-1)x), where x = 2 pi t / T, t is time measured in seconds, T is the period of our impulse train and n is an integer such that 2n/T is larger than the sampling rate. That's a hell of a lot of sinusoids! The formula is impractical for real time sound synthesis as it stands. There is also the nuisance of the DC-component 1/2 and the fact that the sinusoids are cosines and not sines. Cosines produce a nasty click if you just turn them on at time t=0. What we really want is something that sounds the same, doesn't have a DC-component and is computationally cheap to synthesize. The ear is almost deaf to static phase shifts so we can simply replace the cosines with sines and end up with sin(x) + sin(2x) + ... + sin( (n-1)x ). It will sound like the real impulse train to a human ear. We still need to do something about the great amount of summation going on to end up with anything practical. The next step requires a little magic with complex numbers. Remember that exp(i x) = cos(x) + i sin(x) and that exp(n x) = exp(x)^n. We can simply replace sin(x) with exp(i x) in our series as long as we take the imaginary part of the resulting expression after we're done. So

sin(x) + sin(2x) + ... + sin( (n-1)x )

= Im( exp(i x) + exp(2i x) + ... + exp((n-1)i x) )

= Im(exp(i x) + exp(i x)^2 + ... + exp(i x)^(n-1) )

= Im(1 + z + z^2 + z^3 + ... + z^(n-1)),

where z = exp(i x) and we've suggestively thrown in a 0 = Im(1). Hey, that's the geometric series! We can replace the summation with (1 - z^n)/(1-z) and end up with

= Im( (1-z^n)/(1-z) ).

With a little more manipulation and using a few trigonometric identities we can express the result using real functions as:

( sin(nx/2) sin( (1+n)x/2 ) )/sin(x/2)
Now that's pretty neat and easy to calculate. There's one little thing though. The expression is not defined when x=0 but using basic analysis we can see that the limit value of the expression is 0 when x approaches 0. It's also straightforward to see that the limit of the derivative at x=0 is n(n+1)/2.

The Method
So using a little bit of math we ended up with ( sin(nx/2) sin( (1+n)x/2 ) )/sin(x/2) for our bandlimited impulse train. It's not the same as the impulse train we started with but it sounds the same and has a few bonus properties that makes it more suitable for sound synthesis, namely no DC component and zero initial amplitude. So now we need to turn the formula in to a usable function that we can call with our digital synthesizer. The first problem we face is division by zero. It can be solved by approximating the function with a power series at the troublesome points. Sparing the details we can see that x n(n+1)/2 is a good approximation to our function near x=0. We can actually do better and take care of the periodicity too by using sin(x/2)n(n+1) when sin(x/2) is near zero. Now we can define a little python function:

def sin_series(x,n): """sin(x) + sin(2*x) + ... + sin((n-1)*x)""" x *= 0.5 s = sin(x) if abs(s)<1e-12: return n*(n-1)*s return sin(n*x)*sin(x*(1-n))/s
Now we face another problem. The number of harmonics n is supposed to be an integer. What if I want to ramp the frequency of up and down. Do the harmonics just pop in and out when the base frequency allows for more terms in the series? That is one solution but the resulting clicking sounds terrible. It's better to smoothly fade out a harmonic as we approach the sampling limit. Finally there is normalization. Summing up a lot of sinusoids can get pretty loud pretty quick so let's just divide the whole thing by the number of harmonics. Adding these features we get:
def blit(phase,harmonics): int_h = floor(harmonics) mu = harmonics-int_h int_h += 1 #sin_series only goes up to (n-1) t = 2*pi*phase return ( sin_series(t,int_h) + mu*sin(int_h*t) )/harmonics
Finally we'd like to make a something that outputs a bandlimited impulse train when we feed it a stream of frequency values. Remember that we can have up to (sampling rate)/(2f) harmonics for a blit with a base frequency of f. The fractional part of the previous expression will take care of fading the harmonics in and out as needed.
srate=44100.0 dt=1.0/srate def blit_gen(frequency): frequency=iter(frequency) phase = 0.0 while True: f = next(frequency) yield blit(phase,0.5*srate/f) phase += dt*f
That should get you buzzing! In the next post we'll look more closely at the blit and other bandlimited waveforms like it.