Monday, 5 December 2011

All aboard the bandlimited impulse train!

Sound synthesis used to be a little hobby of mine. A while ago I dug up some of my old algorithms and decided to give this passion a new go. I never finished my grand plans of creating the perfect sound synthesis library and I didn't properly release the stuff that I had worked on. I've since realized that perfection is the enemy of completion.

Learning from past frustrations I've decided to dig up the lost audio gems and make little blog posts about them explaining the motivation and inner workings of the digital noise makers. I will divide the posts in three sections. For the non-technical reader I'll provide The Meat - a down to earth description of the algorithm explaining the motivation behind it. For the math enthusiast we'll have The Math - a glimpse into the platonic structure of synthesized sound. And finally for our programming friends I'll prepared The Method - a working implementation of the digital buzzer in question.

The natural starting point for a sound synthesis is the humble sample also called an impulse. A single impulse sounds like a simple click or a snap. Put enough of them in a row and you will have a sound. The simplest way to do it is simply to repeat an impulse with regular intervals creating an impulse train. The audible result is a very thin sounding buzz. As we're aiming for a digital implementation of the impulse train we'll need to work around a few issues that come with sampling platonic objects with finite sampling rate like 44100 Hz in the case of CD audio.

The Meat
Just by itself the impulse train may sound thin and annoying but it is a great starting point for subtractive synthesis. Creating sound by filtering it requires a rich starting waveform. The impulse train has equal amounts of all harmonics that exist in a periodic sound and is thus a model specimen to be sculpted by removing parts of those harmonics.

There isn't much to the impulse train if you can work with integer intervals output 1 once, 0 for a few times, repeat and you'll have your sound wave. The problem is that you can only reach a limited number of frequencies with this approach. Working with 44100 Hz sampling rate a stream of 101010... would be a 22025 Hz buzz and a stream consisting of 100100100... would buzz with a frequency of 14700 Hz. In general the impulse train with frequency f = 44100/n will be:
1, i mod n = 0
0, otherwise
where i ranges over the samples indexed by integers. (The technical readers have correctly pointed out that this function will not produce the "correct" impulse train, but it's almost right and it isn't aliased anyway, that is, it's a bandlimited oscillator without extra partials in the wrong places). However there is no way make a buzz of say 10000 Hz this way because it doesn't divide the sampling rate exactly. How can you put 3.41 zeros between the ones? There is no way to do such a thing. You could try putting 3 zeros sometimes and occasionally 4 zeros to get the average number of zeros between the ones to approximately 3.41 in the long run but the result sounds either just awful or very noisy depending on the way you do it. More finesse is required. There is a way to make a waveform that gives the clean stream of ones and zeros for integer periods (ok the math section formulas don't exactly produce the ones and zeros version but it's "wrong" anyway) and still works without noise or aliasing artifacts with fractional periods. To find out what this magical waveform is we need to use math.


The Math

In the platonic realm of perfect fidelity the impulse is not a sample of some stream of numbers or even a real function. The unit impulse is actually the Dirac delta function. An infinitely sharp spike is truly the grandfather of all clicks. The impulse train then is of course the Dirac comb. The Fourier series of this beast with period T is (1 + 2cos(x) + 2cos(2x) + 2cos(3x) + ...)/T , where x = 2 pi t / T. This means that the impulse train contains all the integer multiples of it's base frequency in equal weights. The sound is perfectly uniform spectrally. That in turn means that by modifying the weights of the cosines, also known as harmonics, with filters we can in principle create every sound that has a period of T. Now how do you sample a function that's not even a function to get from the real line to the digital domain of samples so you can play the thing through your speakers? Precisely speaking you don't. Your speakers really don't do infinity and even if they did you must sample precisely at the multiples of T to get something that isn't a constant zero.
The solution is to remember that sinusoids like cos(x) can only be faithfully sampled when their frequency is less than half the sampling rate (also known as the nyquist frequency. It all really just boils down to the sampling theorem). If you sample a high frequency sinusoid you still get a sinusoid but with the wrong frequency because of aliasing. So to sample our idealized impulse train we simply take it's Fourier series and terminate it before it reaches the sampling barrier. This procedure is know as bandlimiting. We also eliminate the nasty infinities while we're at it. To make matters simpler we'll drop the common factor of 2/T.
Se we end up with 1/2 + cos(x) + cos(2x) + ... + cos( (n-1)x), where x = 2 pi t / T, t is time measured in seconds, T is the period of our impulse train and n is an integer such that 2n/T is larger than the sampling rate. That's a hell of a lot of sinusoids! The formula is impractical for real time sound synthesis as it stands. There is also the nuisance of the DC-component 1/2 and the fact that the sinusoids are cosines and not sines. Cosines produce a nasty click if you just turn them on at time t=0. What we really want is something that sounds the same, doesn't have a DC-component and is computationally cheap to synthesize. The ear is almost deaf to static phase shifts so we can simply replace the cosines with sines and end up with sin(x) + sin(2x) + ... + sin( (n-1)x ). It will sound like the real impulse train to a human ear. We still need to do something about the great amount of summation going on to end up with anything practical. The next step requires a little magic with complex numbers. Remember that exp(i x) = cos(x) + i sin(x) and that exp(n x) = exp(x)^n. We can simply replace sin(x) with exp(i x) in our series as long as we take the imaginary part of the resulting expression after we're done. So 
sin(x) + sin(2x) + ... + sin( (n-1)x ) 
= Im( exp(i x) + exp(2i x) + ... + exp((n-1)i x) ) 
= Im(exp(i x) + exp(i x)^2 + ... + exp(i x)^(n-1) ) 
= Im(1 + z + z^2 + z^3 + ... + z^(n-1)),
where z = exp(i x) and we've suggestively thrown in a 0 = Im(1). Hey, that's the geometric series! We can replace the summation with (1 - z^n)/(1-z) and end up with
= Im( (1-z^n)/(1-z) ).
With a little more manipulation and using a few trigonometric identities we can express the result using real functions as:
( sin(nx/2) sin( (1+n)x/2 ) )/sin(x/2)
Now that's pretty neat and easy to calculate. There's one little thing though. The expression is not defined when x=0 but using basic analysis we can see that the limit value of the expression is 0 when x approaches 0. It's also straightforward to see that the limit of the derivative at x=0 is n(n+1)/2.

The Method
So using a little bit of math we ended up with ( sin(nx/2) sin( (1+n)x/2 ) )/sin(x/2) for our bandlimited impulse train. It's not the same as the impulse train we started with but it sounds the same and has a few bonus properties that makes it more suitable for sound synthesis, namely no DC component and zero initial amplitude. So now we need to turn the formula in to a usable function that we can call with our digital synthesizer. The first problem we face is division by zero. It can be solved by approximating the function with a power series at the troublesome points. Sparing the details we can see that x n(n+1)/2 is a  good approximation to our function near x=0. We can actually do better and take care of the periodicity too by using sin(x/2)n(n+1) when sin(x/2) is near zero. Now we can define a little python function:

def sin_series(x,n): """sin(x) + sin(2*x) + ... + sin((n-1)*x)""" x *= 0.5 s = sin(x) if abs(s)<1e-12: return n*(n-1)*s return sin(n*x)*sin(x*(1-n))/s
Now we face another problem. The number of harmonics n is supposed to be an integer. What if I want to ramp the frequency of up and down. Do the harmonics just pop in and out when the base frequency allows for more terms in the series? That is one solution but the resulting clicking sounds terrible. It's better to smoothly fade out a harmonic as we approach the sampling limit. Finally there is normalization. Summing up a lot of sinusoids can get pretty loud pretty quick so let's just divide the whole thing by the number of harmonics. Adding these features we get:
def blit(phase,harmonics): int_h = floor(harmonics) mu = harmonics-int_h int_h += 1 #sin_series only goes up to (n-1) t = 2*pi*phase return ( sin_series(t,int_h) + mu*sin(int_h*t) )/harmonics
Finally we'd like to make a something that outputs a bandlimited impulse train when we feed it a stream of frequency values. Remember that we can have up to (sampling rate)/(2f) harmonics for a blit with a base frequency of f. The fractional part of the previous expression will take care of fading the harmonics in and out as needed.
srate=44100.0 dt=1.0/srate def blit_gen(frequency): frequency=iter(frequency) phase = 0.0 while True: f = next(frequency) yield blit(phase,0.5*srate/f) phase += dt*f
That should get you buzzing! In the next post we'll look more closely at the blit and other bandlimited waveforms like it.

No comments:

Post a Comment