Friday, 26 December 2014

Formant waveforms

Introduction

In speech sciences a formant is defined as a concentration of acoustic energy around a particular frequency. A common way to achieve such a concentration is to apply a band-pass filter to a sawtooth wave. However we're not going to do that here because it just so happens that a periodic waveform with a gaussian energy peak in the spectrum has a rapidly converging series representation both in the time and frequency domains. By carefully picking the representations apart we can create an algorithm that is relatively cheap to evaluate computationally.

Theory

The constructions begins with the observation that the continuous Fourier transform of a gaussian function is also a gaussian. This gives us two complementary ways of creating a periodic waveform out of it. In the time domain we convolve the gaussian with an impulse train and in the frequency domain we multiply an impulse train with a gaussian envelope.
The frequency domain representation gives rise to the Jacobi theta function:

The time domain representation gives:
where C is a normalization constant.
The first representation converges rapidly when q<0.2 and the second when q>0.2 giving us a criterion for picking the one to use. For sound synthesis purposes picking the first five terms (from n=-2 to n=2 for the second representation) is more than enough.
The first representation is simply an additive synthesis formula that benefits from the Chebyshev recurrence:


where the cos function has to be evaluated only once to get the rest of the harmonics.

The second representation is more complex and seems to require at least 3 calls to the exp function in addition to the one log call.

Approximation

The rough algorithm laid out above isn't bad, but for sound synthesis purposes we can get away with a simple approximation:


where C is another normalization constant.

Simple formants

Using the above approximation we can finally get simple formant waveforms that have the energy centered around a given harmonic:


The maximum amplitude has been normalized to one and for this approximation to work the ratio parameter r has to be an integer. There is also a numerical limit to how high the width parameter w can be. For large widths this approximation has to be used:
w > 700

Advanced formants

To get around the restriction that the ratio has to be an integer we again turn to the series representations and apply some theory from Fourier analysis. For the frequency domain version we simply shift the coefficients away from the original values while taking into account the effect on negative frequencies (leaving out normalization):


With the loss of symmetry the optimal range of harmonics to evaluate changes to about ten centered around r. Luckily the Chebyshev recurrence still applies even to negative frequencies but now we need up to three evaluations of cos to get it going from an arbitrary n: cos(x), cos((n-1)x) and cos(nx).

The time domain series changes by multiplying the kernel by cos(2πrt):


For optimized evaluation the cosine terms can be absorbed into the exponential as an imaginary phase and then only keeping the real part of the final result. The optimization details get messy but I was able to squeeze the number of needed function calls to two complex and two real exp calls.

Sound samples

So let's make some noise. Here's a width sweep from zero to ten of the simple formant at 220 Hz centered around the fourth harmonic:


You will notice a click at the beginning of the sample. That's a direct consequence of using cosines in the series representation. Unfortunately I was unable to find a passable time domain approximations for the version that uses sines in the series. The problem is called finding the Hilber transform of a signal and it's notoriously hard.
Here's a a sample of the advanced formant sweeping the ratio from zero to ten while keeping the width constant:


Applications

The advanced formant waveform is directly applicable to speech synthesis for creating ultra pure energy concentrations for maximum creepiness.
The simple formant finds use in FM-synthesis where having little energy on the lower harmonics of the modulating waveform gives more stability to the sound resulting in less beating partials and a smoother timbral falloff with decreasing index. Here's an example where we're modulating a 220 Hz carrier with a 110 Hz wave. The first part has a sine wave modulating a sine wave and the second part has a simple formant centered around the second harmonic modulating a sine wave:

Implementation

Here's a gist with Python implementations of the normalized theta, simple formant and the theta formant waveforms:

Tuesday, 16 December 2014

Introducing the Lissajous waveforms

Introduction

I recently discovered a family of periodic functions based on the Lissajous curves that have many desirable properties when it comes to sound synthesis.
First of all the functions are smooth. This means that last harmonics fall off exponentially leading to a pleasant round sound and less aliasing artifacts. Secondly they come with two parameters that control the timbre of the sound making it sharper in various ways.

Lissajous waveforms

A Lissajous curve is defined by the parametric equation:
x = A sin(a t + δ)
y = B sin(b t)
For example taking A = B = 1, a = 2, b=3, δ = π/2 we end up with the following figure:
This a two-dimensional curve so we need some way of reducing it to a function of one variable. Here we choose to take the angle measured from the x-axis counter-clockwise:
In other words we're aplying the atan2(y, x)-function to the parametric equation. The result looks something like this:
Oops! There are at least three things wrong here. The obvious flaw is that the graph is clearly not smooth. This is caused by atan2 being a multi-valued function that is reduced to a single value. In other words the first problem is angle wrapping. To unwrap the angle in this case we simply add 2π to the result if the parameter t is in the range from 0 to π and the angle is negative and subtract 2π if t is in the range from π to 2π and the angle is positive.
The second problem is that the period is 2π. This is easy to fix by scaling t.
The third problem is normalization. It turns out that after unwrapping the angle the maximum value of the function is 5π/4 at t = π/2 . Simply dividing the function by this value solves the problem.
After fixing everything we end up with this waveform:
A nice and smooth waveform that sounds like a soft triangle wave:

Adding timbre control

For fixed a and b the Lissajous curve comes with three interesting parameters A, B and δ, but because we're only interested in the angle traced by the figure we can ignore it's size and replace A and B with a sharpness parameter S.
A = 1 + S
B = 1 - S
By varying S away from zero we can make the waveform more or less sharp:
S = 0.75
The positive and negative values of S change the waveform in different ways:
S = -0.75
Sweeping the sharpness from 1 to -1 sounds like this:

Bias control

The value of δ = π/2 is special in that it makes the a=2, b=3 waveform as smooth as possible. Other values are however possible. Take δ = 2 for example. The resulting Lissajous curve looks like this:
And the resulting Lissajous waveform looks like this:
In effect we have biased the waveform to lean more to the right. There is is a limit to how much the δ parameter can be changed. If the the Lissajous curve crosses zero or winds around zero on the wrong side the angle unwrapping breaks down. The exact range that δ can take depends on a and b. From δ we can derive a bias parameter that varies in the range from -1 to 1 with 0 representing the smoothest achievable waveform.
Sweeping the bias from 1 to -1 sounds like this:

 

The problem of normalization

As you may see from the graphs the maximum value of the waveform differs from one when S differs from zero. It turns out that the maximum can be expressed in terms of elementary functions, but save for the a=1, b=1 case the expression is prohibitively huge. I'm therefore leaving full normalization out from the implementation.

The odd-odd case

We've dealt with an even-odd case above with a=2, b=3. Having one of these parameters even has the fortunate side-effect that the curve never makes a full rotation around zero without making one back. This in turn limits the angle to a finite range. In the case that both a and b are odd the situation is different and the angle keeps increasing (or decreasing) without limit. Take a=1, b=3 for example:
The graph of the angle looks like this:

We can see that the angle keeps decreasing on average and that the shape repeats twice with every "period".
The fix this we subtract the average angle and divide the frequency by two. In this case the result looks like this:
and sounds like this:

The sharpness as bias parameters work in the same way as they do in the even-odd case. Here's a sharpness sweep:

Implementation

That wraps up the theory of the Lissajous waveforms. For larger a and b the specifics of angle unwrapping and normalization get increasingly complex and sometimes it is necessary to find the maximum value numerically instead of reasoning it out from a graph.

This Gist contains Python implementation of all the unique Lissajous waveforms up to a ≤ b ≤ 6: