Sampled Harmonies

Thursday, 15 January 2015

Fast algorithms for polynomials and exponentials

Introduction

In this blog post we're going to derive fast algorithms for evaluating the successive values of polynomials and exponentials.

Polynomials

Let's start simple. How do you evaluate successive values of P(n) = an + b, a first order polynomial? You initialize an accumulator to b and keep adding a to it.

Notice how we don't have to multiply by a anywhere to get the values.

Let's try P(n) = n², the simplest second order polynomial. We needed one accumulator with the first one so maybe we need two accumulators this time.

The bottom accumulator y⁰ will have the values 0, 1, 3, 6, 10, 15... That's definately not linear, but not the sequence 0, 1, 4, 9, 16, 25... that we're after. It turns out that the general term in the sequence is n(n + 1)/2 or multiplying by two n² + n. Multiplication is easy (just replace the 1 with a 2 above), but how do we get rid of the n term? The equations are linear and actually include the first order iteration. Adding a value to the initial value of y⁰ shift the sequence by a constant and adding a value to the initial value of y¹ adds a linear function of n to the final sequence. In this case we want to add P(n) = -n. The final iteration for n² becomes:

The iteration is linear so we can now add all the lower order terms for a general second order polynomial:

So if you pre-calculate the 2a term you only need two additions to get the next value of a polynomial that naïvely requires three multiplications to evaluate (two multiplications if you reduce it to Horner form). Pretty cool, huh?

Moving on. Let's tackle P(n) = n³. Generalizing the naïve iteration:

We get 0, 1, 4, 10, 20, 35... or n(n + 1)(n + 2)/6 in general. Multiplying by six we get a n³ term, but now we get both quadratic and linear nuisance terms. Luckily we already know how to generate both kinds of terms so we just need to subtract them:

Or for a general third order polynomial:

Starting to look a bit complex, but it works. Very useful if you wish to evaluate cubic Bézier curves or interpolate other functions using cubic Hermite splines.

I'm going to leave the higher order polynomials as an exercise. You can use the strategy introduced here: Write out the naïve iteration, derive the general term, expand it and subtract the lower order nuisance terms that you already know how to compute.

Computing polynomials at non-integer values

Above we derived the general iteration for polynomials of the form P(n), where n is an integer. What if you want to evaluate P(t) for successive instants of time? It's very simple. Just write t as t = nΔt, where Δt is the inverse of the sampling rate and you'll get a new polynomial as a function of n. The same goes for starting the iteration at arbitrary instant of time t₀. Just write t = nΔt + t₀, expand to get get the coefficients of nⁱ and plug them to the formula for P(n).

Exponentials

Nothing in the above iterations says that you have to use addition to get the next values. There's a homomorphism between the additive group and the multiplicative group that says you can use 1 in place of 0, multiplication in place of addition and raising to a power in place of scalar multiplication.

Let's write out the iteration for the exponential function to get a taste:

And the gaussian for good measure:

Or the general gaussian in terms of t = nΔt:

You get the idea. In general it's possible to evaluate e^P(n) with only m multiplications where m is the order of the polynomial P(n).

Saturday, 3 January 2015

Algorithms for Additive Synthesis

Introduction

Additive synthesis is a sound synthesis technique that creates timbre by adding sine waves together.

In this blog post we're going to look at two special cases of additive synthesis and implement efficient algorithms for them.

First we're going to look into modal synthesis with simple exponentially decaying partials and then advance into harmonic synthesis with freely controllable frequency. Instead of having to evaluate trigonometric functions for each of the partials, both of these techniques will only require a couple of multiplications and additions.

Modal Synthesis

With fixed frequencies and decay rates the modal synthesis model looks like this:

This approach can be used to create the sound of drums or idiophones such as the marimba or bells where there are relatively few partials that are related inharmonically.

Let's look at a single partial:

Using a few trigonometric identities we can manipulate the equation into the following recurrence:

Where f_s is the sampling rate. The constants α₁ and α₂ can be calculated once and the recurrence states that to get the next sample of the partial you simply add up two previously generated samples multiplied by the constants, a huge saving over having to evaluate the sin and exp functions for every sample.

Harmonic Synthesis

The modal synthesis above suffers from fixed frequencies and a general lack of control over the amplitude envelopes. If we restrict our attention to a harmonic series where each of the partial frequencies is an integer multiple of the base frequency we get the following equation:

Where the fundamental frequency f and each of the amplitude envelopes a_k depends on time t.
Lets look at a single instant of time:

Where we have absorbed the a_k and ϕ_k terms into a single complex term b_k. The end result is a simple sum of the powers of the complex exponential which can be evaluated with Horner's method:

where I have thrown in a timbral sharpness parameter s for free. The method requires only K-1 complex multiplications in total.

While one complex multiplication per harmonic isn't bad it still requires four real multiplications to be evaluated. We can do better by further reducing our attention by dropping out the ϕ_k terms:

This series can be evaluated with the Chebyshev recurrence:

And then applying the Clenshaw algorithm to this recurrence:

From this recurrence we can see that we only need one real multiplication per harmonic (assuming the 2 v term is computed only once).

What about the case of sines in the series?

This time we use the Chebyshev recurrence of the second kind:

Giving us the final Clenshaw recurrence:

Again there is only one real multiplication per harmonic, but now we had to evaluate both the sin and cos functions once.

Let's do one more. This time with only odd harmonics:

Now the Clenshaw recurrence reads:

Implementation

As usual. Here's a Python gist with implementations of the algorithms introduced here:

Algorithms for Additive Synthesis

Friday, 26 December 2014

Formant waveforms

Introduction

In speech sciences a formant is defined as a concentration of acoustic energy around a particular frequency. A common way to achieve such a concentration is to apply a band-pass filter to a sawtooth wave. However we're not going to do that here because it just so happens that a periodic waveform with a gaussian energy peak in the spectrum has a rapidly converging series representation both in the time and frequency domains. By carefully picking the representations apart we can create an algorithm that is relatively cheap to evaluate computationally.

Theory

The constructions begins with the observation that the continuous Fourier transform of a gaussian function is also a gaussian. This gives us two complementary ways of creating a periodic waveform out of it. In the time domain we convolve the gaussian with an impulse train and in the frequency domain we multiply an impulse train with a gaussian envelope.

The frequency domain representation gives rise to the Jacobi theta function:

The time domain representation gives:

where C is a normalization constant.
The first representation converges rapidly when q<0.2 and the second when q>0.2 giving us a criterion for picking the one to use. For sound synthesis purposes picking the first five terms (from n=-2 to n=2 for the second representation) is more than enough.
The first representation is simply an additive synthesis formula that benefits from the Chebyshev recurrence:

where the cos function has to be evaluated only once to get the rest of the harmonics.

The second representation is more complex and seems to require at least 3 calls to the exp function in addition to the one log call.

Approximation

The rough algorithm laid out above isn't bad, but for sound synthesis purposes we can get away with a simple approximation:

where C is another normalization constant.

Simple formants

Using the above approximation we can finally get simple formant waveforms that have the energy centered around a given harmonic:

The maximum amplitude has been normalized to one and for this approximation to work the ratio parameter r has to be an integer. There is also a numerical limit to how high the width parameter w can be. For large widths this approximation has to be used:

w > 700

Advanced formants

To get around the restriction that the ratio has to be an integer we again turn to the series representations and apply some theory from Fourier analysis. For the frequency domain version we simply shift the coefficients away from the original values while taking into account the effect on negative frequencies (leaving out normalization):

With the loss of symmetry the optimal range of harmonics to evaluate changes to about ten centered around r. Luckily the Chebyshev recurrence still applies even to negative frequencies but now we need up to three evaluations of cos to get it going from an arbitrary n: cos(x), cos((n-1)x) and cos(nx).

The time domain series changes by multiplying the kernel by cos(2πrt):

For optimized evaluation the cosine terms can be absorbed into the exponential as an imaginary phase and then only keeping the real part of the final result. The optimization details get messy but I was able to squeeze the number of needed function calls to two complex and two real exp calls.

Sound samples

So let's make some noise. Here's a width sweep from zero to ten of the simple formant at 220 Hz centered around the fourth harmonic:

You will notice a click at the beginning of the sample. That's a direct consequence of using cosines in the series representation. Unfortunately I was unable to find a passable time domain approximations for the version that uses sines in the series. The problem is called finding the Hilber transform of a signal and it's notoriously hard.
Here's a a sample of the advanced formant sweeping the ratio from zero to ten while keeping the width constant:

Applications

The advanced formant waveform is directly applicable to speech synthesis for creating ultra pure energy concentrations for maximum creepiness.

The simple formant finds use in FM-synthesis where having little energy on the lower harmonics of the modulating waveform gives more stability to the sound resulting in less beating partials and a smoother timbral falloff with decreasing index. Here's an example where we're modulating a 220 Hz carrier with a 110 Hz wave. The first part has a sine wave modulating a sine wave and the second part has a simple formant centered around the second harmonic modulating a sine wave:

Implementation

Here's a gist with Python implementations of the normalized theta, simple formant and the theta formant waveforms:

Jacobi theta function and associated formant waveforms

Tuesday, 16 December 2014

Introducing the Lissajous waveforms

Introduction

I recently discovered a family of periodic functions based on the Lissajous curves that have many desirable properties when it comes to sound synthesis.
First of all the functions are smooth. This means that last harmonics fall off exponentially leading to a pleasant round sound and less aliasing artifacts. Secondly they come with two parameters that control the timbre of the sound making it sharper in various ways.

Lissajous waveforms

A Lissajous curve is defined by the parametric equation:

x = A sin(a t + δ)

y = B sin(b t)

For example taking A = B = 1, a = 2, b=3, δ = π/2 we end up with the following figure:

This a two-dimensional curve so we need some way of reducing it to a function of one variable. Here we choose to take the angle measured from the x-axis counter-clockwise:

In other words we're aplying the atan2(y, x)-function to the parametric equation. The result looks something like this:

Oops! There are at least three things wrong here. The obvious flaw is that the graph is clearly not smooth. This is caused by atan2 being a multi-valued function that is reduced to a single value. In other words the first problem is angle wrapping. To unwrap the angle in this case we simply add 2π to the result if the parameter t is in the range from 0 to π and the angle is negative and subtract 2π if t is in the range from π to 2π and the angle is positive.

The second problem is that the period is 2π. This is easy to fix by scaling t.

The third problem is normalization. It turns out that after unwrapping the angle the maximum value of the function is 5π/4 at t = π/2 . Simply dividing the function by this value solves the problem.

After fixing everything we end up with this waveform:

A nice and smooth waveform that sounds like a soft triangle wave:

Adding timbre control

For fixed a and b the Lissajous curve comes with three interesting parameters A, B and δ, but because we're only interested in the angle traced by the figure we can ignore it's size and replace A and B with a sharpness parameter S.

A = 1 + S

B = 1 - S

By varying S away from zero we can make the waveform more or less sharp:

S = 0.75

The positive and negative values of S change the waveform in different ways:

S = -0.75

Sweeping the sharpness from 1 to -1 sounds like this:

Bias control

The value of δ = π/2 is special in that it makes the a=2, b=3 waveform as smooth as possible. Other values are however possible. Take δ = 2 for example. The resulting Lissajous curve looks like this:

And the resulting Lissajous waveform looks like this:

In effect we have biased the waveform to lean more to the right. There is is a limit to how much the δ parameter can be changed. If the the Lissajous curve crosses zero or winds around zero on the wrong side the angle unwrapping breaks down. The exact range that δ can take depends on a and b. From δ we can derive a bias parameter that varies in the range from -1 to 1 with 0 representing the smoothest achievable waveform.

Sweeping the bias from 1 to -1 sounds like this:

The problem of normalization

As you may see from the graphs the maximum value of the waveform differs from one when S differs from zero. It turns out that the maximum can be expressed in terms of elementary functions, but save for the a=1, b=1 case the expression is prohibitively huge. I'm therefore leaving full normalization out from the implementation.

The odd-odd case

We've dealt with an even-odd case above with a=2, b=3. Having one of these parameters even has the fortunate side-effect that the curve never makes a full rotation around zero without making one back. This in turn limits the angle to a finite range. In the case that both a and b are odd the situation is different and the angle keeps increasing (or decreasing) without limit. Take a=1, b=3 for example:

The graph of the angle looks like this:

We can see that the angle keeps decreasing on average and that the shape repeats twice with every "period".

The fix this we subtract the average angle and divide the frequency by two. In this case the result looks like this:

and sounds like this:

The sharpness as bias parameters work in the same way as they do in the even-odd case. Here's a sharpness sweep:

Implementation

That wraps up the theory of the Lissajous waveforms. For larger a and b the specifics of angle unwrapping and normalization get increasingly complex and sometimes it is necessary to find the maximum value numerically instead of reasoning it out from a graph.

This Gist contains Python implementation of all the unique Lissajous waveforms up to a ≤ b ≤ 6:

Lissajous waveforms in Python

Wednesday, 26 November 2014

Stable variable filter design

Introduction

At the heart of most filers is a resonator characterized by its resonant frequency, Q factor and gain. In digital sound processing it is often desirable to change these parameters over time to give a dynamic character to the synthesized sound. For example the defining wobble bass of modern dubstep can be achieved by quickly varying a low-pass filter over a bass note making it sound dark or bright in the rhythm of the song.

The problem

There is unfortunately a limit on how quickly you can vary the filter parameters of a resonator based on a traditional two pole design. A straightforward implementation preserves the slope of the impulse response when the filter parameters change. While this may look nice and smooth when plotted it causes an undesirable audible change in loudness that often sound like clicks or pops.

Impulse response of a time variable two pole resonator.

The solution

The reason two pole filters are used in the first place is that it's not possible to create a resonator using only real coefficients. However there is no magic involved in using complex numbers and creating a complex one pole resonator is in fact fairly straightforward.

Complex multiplication

You may recall that complex numbers are defined by their real and imaginary parts or alternatively by their magnitude (distance from zero) and phase (angle measured from the real line). You may also recall that complex multiplication of two numbers is achieved by multiplying the magnitudes together and adding up the phases. This means that starting from one and repeatedly multiplying it by a complex number with magnitude less than one and non-zero phase creates a spiral towards zero. Taking the real part of this gives us an exponentially decaying sine wave.

The number one successively multiplied by a complex number.

Filter design

Now we can simply take the above observation and note that complex multiplication is a linear operation to get a filter out of it. We just need to pick our multiplying complex number so that it takes frequency / (sampling rate) steps to go around one full circle and thus make the real part oscillate at the desired frequency.

Effectively this creates a complex one pole filter that preserves the magnitude of the signal when we change our resonance parameters. The audible effect is much more pleasant than with the two pole design. Only one issue remains and that is the fact that impulse response of this filter starts at one and decays from that. A simple fix is to multiply the signal by the imaginary unit to get a filter where the impulse response starts at zero. Another feature is that it's impossible to achieve Q-factors bellow one half.

Impulse response of a time variable one pole resonator.

Code

The filter design sketched above is realized by the following python generator:

def resonator(source, b1, frequency, decay, srate=44100.0): a1 = -cmath.exp(-2 * pi * frequency / srate * 1j) a1 *= math.exp(-decay * frequency / srate) y1 = 0j for sample in source: y0 = 1j * b1 * sample - a1 * y1 yield y0.real y1 = y0

Here I have divided the multiplying complex number a1 into the rotating phase part and the decaying magnitude part. (The gain constant is indeed called b1 and not b0 because the impulse response is delayed by one sample compared to a traditional two pole design.)
The design outlined above is unnormalized and only valid near DC. To arrive at the final form we need to take the bilinear transform of the two-pole version and factor that into it's one-pole form. The resulting designs for low-pass, high-pass, band-pass, band-reject (notch) and all-pass filters are available at: https://gist.github.com/frostburn/318e278dcdb050d71872

Thursday, 5 January 2012

The Basic Waveforms

Now that we've dealt with the impulse train it's time to look at other periodic waveforms that often form the basis of more advanced sound synthesis. They're also great by themselves if you're into chiptunes like I am.
I tend to think of sound in terms of the spectrum (the continuous Fourier transform) whenever it makes sense so I regard the impulse train as the second simplest* waveform after the pure sine wave. Most of the practical waveforms lie between these two extremes and the simplest way to go between them is integration.
Starting with the impulse train:

Blit220 by frostburn

if we simply integrate we will get something like this:

which isn't a periodic waveform. It's actually the floor function. What we need to do is to DC-block our wave forms before integration to keep the integral from "blowing up". So we subtract 1 from the impulse train, integrate and end up with:

We could take this as the definition of the sawtooth wave but I like my waveforms normalized, DC-blocked, having a value of zero at t=0 and with a positive derivative at the origin so my definition of the saw wave is:

saw(t) = 2(t - floor(t + 0.5))

Saw220 by frostburn

Subtracting two saws 180° apart gives us a square wave:

Square220 by frostburn

Integrating this we get a triangle wave:

Triangle220 by frostburn

We could also subtract two saw waves with different phase shifts to get different kinds of rectangular waves. In terms of normalization there are two ways that make sense. The simpler one is

pulse(t, duty) = bool(t - floor(t) < duty),

a wave that is 1 for the duty cycle and 0 otherwise:

Pulse08 220 by frostburn

The second normalization is DC-blocked and maintains the loudness of the signal, that is the energy, that is the integral of the square of the signal over time:

square(t, duty) = sqrt( (1 - duty)/duty ) if t - floor(t) < duty

-sqrt( duty/(1 - duty) ) otherwise

Unfortunately the loudness normalized version blows up when duty approaches zero or unity. In some cases it may make more sense to just use the difference of two saw waves to get a DC-blocked well behaving square wave.

The width modulated triangle wave is an easier case because it maintains loudness and peak amplitude at the same time:

triangle(t, width) = 2x/width if x < width/2

-2y/(1 - width) otherwise,

where x = t - floor(t + 0.5),

y = t - floor(t) - 0.5.

Triangle08 220 by frostburn

As an interesting detail this definition of triangle reduces to saw at width=1. Now if the square waves integral deserves to be called basic then so does the integral of the saw wave, the parabolic wave:

par(t) = 1/2 - 6 (x - floor(x + 0.5))²,

where x = t - 1/sqrt(12) is shifted so that par(0)=0 and I've already normalized the integral to have unity peak amplitude.

Par220 by frostburn

Integrating and normalizing once more we get the cubic oscillator:

cub(t) = sqrt(27) x(1 - 4x²),

where x = t - floor(t + 0.5).

Cub220 by frostburn

We can go on like this forever and once we're done we have the sine wave:

sine(t) = sin(2π t)

Sine220 by frostburn

While these waveforms are simple and pretty straightforward to implement and use they suffer from aliasing. Aside from the impulse train I rendered the sound clips at 352800 Hz sampling rate and resampled them down to 44100 Hz in Audacity. The careful listener may still be able to pick up a few sampling artifacts even at 8x oversampling.
In the next post we'll learn a little trick to reduce aliasing while also avoiding the Gibbs phenomenon.

*) One might even say that the impulse train is the simplest wave form because its Fourier transform is another impulse train in terms of the spectrum.