[Speex-dev] FFT Resampler

Thorvald Natvig thorvald at natvig.com
Thu May 29 16:18:45 PDT 2008

Ok. I did some quality tests.

First off; never do quality tests with ints. I had serious problems 
interpreting my results until it dawned on me that the signal 
differences were just 0 or 1. So, after a lot of scratching my head, 
these are done comparing the result from the _float versions (which is 
how both resamplers work internally anyway).

What I did was this:
Load speex_wb.wav as one large chunk of data.
Pad data with as many zeroes as there are samples.
Convert to long double.
Use one long double FFT for the entire thing.
Insert or chop off zeroes so the new length is 
Use one long double iFFT for the entire thing.
We'll call the FFT and iFFT of this our reference.

Then, for each resampler below, I've reported the maximum numerical 
difference in the time domain(comparing ref[i] with sig[i]) as well as 
SNR. Since my knowledge of SNR for this is a bit sketchy, it's computed 
as follows:

Pad resampled signal with as many zeroes as there are samples.
Convert to long double.
Use one long double FFT for the entire thing.

Then, for both reference and resampled, let power[i] = sqrt(real[i]^2 + 
imag[i]^2). We only care about the lower half of this power (remember we 
padded with zeroes).
Then, let SNR = sum[all i] abs(ref_power[i] / resamp_power[i] - 1.0)
IE; SNR = 0 is a perfect signal. Everything else means the signal deviates.

There are 3 SNR values posted below. The first value is the 0->4khz 
range (which for 48khz output means the lower 1/6th of the power 
spectrum). The second is the 0->8khz range (full original signal), and 
the last is the full range.
The reason I split it is that the filter-based resampler has cutoff 
filter, so it zeroes out frequencies near the nyquist. So the SNR is 
unfair for the 0->8 range.

Anyway, on to the results.

First, a 16=>16 resampling.
Filt Q10: Diff 0.883327, SNR 3.12531e-07 / 0.472589
FFT 320: Diff 0.00292969, SNR 2.57974e-07 / 4.77473e-05
Both resamplers will recreate the original samples. The filter based 
does limit the upper part of the signal.

Both resamplers deal fine with 16=>48, so let's skip directly to 16=>44.1:
Filt Q0: 2.57e-03 3.15e-01 7.51e-01
Filt Q1: 2.12e-04 4.29e-01 7.93e-01
Filt Q2: 1.33e-04 2.92e-01 7.43e-01
Filt Q3: 2.20e-05 9.20e-01 9.71e-01
Filt Q4: 1.96e-05
Filt Q5: 9.61e-06

FFT+0: 3.83e-02 1.91e-01 7.06e-01 (And you can clearly hear this)
FFT+16: 8.10e-03 6.18e-02 6.60e-01 (violates the resampler requirements 
and shifts frequencies slightly)
FFT+160: 1.14e-05 3.75e-03 6.39e-01 (shortest allowed overlap)

So, FFT160 is somewhere between Q4 and Q5. And it's 6 times faster than Q4.
Testing with twice the block and overlap length:
FFT 640/320: 1.13e-05 3.49e-03 6.38e-01
erm. Hm. Need more testing on that one, I think.

Moving to 16=>48, let's examine different block and overlap lengths:
160+16: 1.20e-05 3.55e-02 6.78e-01
160+80: 1.20e-05 4.02e-03 6.68e-01
320+32: 1.19e-05 7.97e-03 6.69e-01
320+160: 1.19e-05 3.82e-03 6.68e-01
320+320: 1.19e-05 3.60e-03 6.68e-01

While this is not nearly enough data, it seems longer overlap reduces 
artifacts in the higher frequencies.

Finally, I do a 48=>44.1 test. (FFT base is now 960).
Filt Q9: 1.58e-05
FFT+160: 5.24e-07
FFT+320: 5.11e-07
FFT+480: 5.20e-07
Not bad, and for 48=>44.1, FFT+160 runs three times faster than Q0 (and 
~50 times faster than Q9 ;))

So if you can survive a bit of latency, this should give you decent 
results very quickly. (Do remember that latency though, it's not 
There's still work to be done looking at aliasing; I'll need to use a 
signal with frequencies closer to the nyquist to do that.

More information about the Speex-dev mailing list