Introduction
In speech sciences a formant is defined as a concentration of acoustic energy around a particular frequency. A common way to achieve such a concentration is to apply a band-pass filter to a sawtooth wave. However we're not going to do that here because it just so happens that a periodic waveform with a gaussian energy peak in the spectrum has a rapidly converging series representation both in the time and frequency domains. By carefully picking the representations apart we can create an algorithm that is relatively cheap to evaluate computationally.
Theory
The constructions begins with the observation that the continuous Fourier transform of a gaussian function is also a gaussian. This gives us two complementary ways of creating a periodic waveform out of it. In the time domain we convolve the gaussian with an impulse train and in the frequency domain we multiply an impulse train with a gaussian envelope.
The frequency domain representation gives rise to the Jacobi theta function:
where C is a normalization constant.
The first representation converges rapidly when q<0.2 and the second when q>0.2 giving us a criterion for picking the one to use. For sound synthesis purposes picking the first five terms (from n=-2 to n=2 for the second representation) is more than enough.
The first representation is simply an additive synthesis formula that benefits from the Chebyshev recurrence:
where the cos function has to be evaluated only once to get the rest of the harmonics.
The second representation is more complex and seems to require at least 3 calls to the exp function in addition to the one log call.
Approximation
The rough algorithm laid out above isn't bad, but for sound synthesis purposes we can get away with a simple approximation:
where C is another normalization constant.
Simple formants
Using the above approximation we can finally get simple formant waveforms that have the energy centered around a given harmonic:
The maximum amplitude has been normalized to one and for this approximation to work the ratio parameter r has to be an integer. There is also a numerical limit to how high the width parameter w can be. For large widths this approximation has to be used:
w > 700 |
Advanced formants
To get around the restriction that the ratio has to be an integer we again turn to the series representations and apply some theory from Fourier analysis. For the frequency domain version we simply shift the coefficients away from the original values while taking into account the effect on negative frequencies (leaving out normalization):
With the loss of symmetry the optimal range of harmonics to evaluate changes to about ten centered around r. Luckily the Chebyshev recurrence still applies even to negative frequencies but now we need up to three evaluations of cos to get it going from an arbitrary n: cos(x), cos((n-1)x) and cos(nx).
The time domain series changes by multiplying the kernel by cos(2πrt):
For optimized evaluation the cosine terms can be absorbed into the exponential as an imaginary phase and then only keeping the real part of the final result. The optimization details get messy but I was able to squeeze the number of needed function calls to two complex and two real exp calls.
For optimized evaluation the cosine terms can be absorbed into the exponential as an imaginary phase and then only keeping the real part of the final result. The optimization details get messy but I was able to squeeze the number of needed function calls to two complex and two real exp calls.
Sound samples
So let's make some noise. Here's a width sweep from zero to ten of the simple formant at 220 Hz centered around the fourth harmonic:
You will notice a click at the beginning of the sample. That's a direct consequence of using cosines in the series representation. Unfortunately I was unable to find a passable time domain approximations for the version that uses sines in the series. The problem is called finding the Hilber transform of a signal and it's notoriously hard.
Here's a a sample of the advanced formant sweeping the ratio from zero to ten while keeping the width constant:
Here's a a sample of the advanced formant sweeping the ratio from zero to ten while keeping the width constant:
Applications
The advanced formant waveform is directly applicable to speech synthesis for creating ultra pure energy concentrations for maximum creepiness.
The simple formant finds use in FM-synthesis where having little energy on the lower harmonics of the modulating waveform gives more stability to the sound resulting in less beating partials and a smoother timbral falloff with decreasing index. Here's an example where we're modulating a 220 Hz carrier with a 110 Hz wave. The first part has a sine wave modulating a sine wave and the second part has a simple formant centered around the second harmonic modulating a sine wave:
Implementation
Here's a gist with Python implementations of the normalized theta, simple formant and the theta formant waveforms: