[Speex-dev] FFT Resampler
Thorvald Natvig
thorvald at natvig.com
Thu May 29 16:18:45 PDT 2008
Ok. I did some quality tests.
First off; never do quality tests with ints. I had serious problems
interpreting my results until it dawned on me that the signal
differences were just 0 or 1. So, after a lot of scratching my head,
these are done comparing the result from the _float versions (which is
how both resamplers work internally anyway).
What I did was this:
Load speex_wb.wav as one large chunk of data.
Pad data with as many zeroes as there are samples.
Convert to long double.
Use one long double FFT for the entire thing.
Insert or chop off zeroes so the new length is
(input_length)*(sample_target)/(sample_source)
Use one long double iFFT for the entire thing.
We'll call the FFT and iFFT of this our reference.
Then, for each resampler below, I've reported the maximum numerical
difference in the time domain(comparing ref[i] with sig[i]) as well as
SNR. Since my knowledge of SNR for this is a bit sketchy, it's computed
as follows:
Pad resampled signal with as many zeroes as there are samples.
Convert to long double.
Use one long double FFT for the entire thing.
Then, for both reference and resampled, let power[i] = sqrt(real[i]^2 +
imag[i]^2). We only care about the lower half of this power (remember we
padded with zeroes).
Then, let SNR = sum[all i] abs(ref_power[i] / resamp_power[i] - 1.0)
IE; SNR = 0 is a perfect signal. Everything else means the signal deviates.
There are 3 SNR values posted below. The first value is the 0->4khz
range (which for 48khz output means the lower 1/6th of the power
spectrum). The second is the 0->8khz range (full original signal), and
the last is the full range.
The reason I split it is that the filter-based resampler has cutoff
filter, so it zeroes out frequencies near the nyquist. So the SNR is
unfair for the 0->8 range.
Anyway, on to the results.
First, a 16=>16 resampling.
Filt Q10: Diff 0.883327, SNR 3.12531e-07 / 0.472589
FFT 320: Diff 0.00292969, SNR 2.57974e-07 / 4.77473e-05
Both resamplers will recreate the original samples. The filter based
does limit the upper part of the signal.
Both resamplers deal fine with 16=>48, so let's skip directly to 16=>44.1:
Filt Q0: 2.57e-03 3.15e-01 7.51e-01
Filt Q1: 2.12e-04 4.29e-01 7.93e-01
Filt Q2: 1.33e-04 2.92e-01 7.43e-01
Filt Q3: 2.20e-05 9.20e-01 9.71e-01
Filt Q4: 1.96e-05
Filt Q5: 9.61e-06
FFT+0: 3.83e-02 1.91e-01 7.06e-01 (And you can clearly hear this)
FFT+16: 8.10e-03 6.18e-02 6.60e-01 (violates the resampler requirements
and shifts frequencies slightly)
FFT+160: 1.14e-05 3.75e-03 6.39e-01 (shortest allowed overlap)
So, FFT160 is somewhere between Q4 and Q5. And it's 6 times faster than Q4.
Testing with twice the block and overlap length:
FFT 640/320: 1.13e-05 3.49e-03 6.38e-01
erm. Hm. Need more testing on that one, I think.
Moving to 16=>48, let's examine different block and overlap lengths:
160+16: 1.20e-05 3.55e-02 6.78e-01
160+80: 1.20e-05 4.02e-03 6.68e-01
320+32: 1.19e-05 7.97e-03 6.69e-01
320+160: 1.19e-05 3.82e-03 6.68e-01
320+320: 1.19e-05 3.60e-03 6.68e-01
While this is not nearly enough data, it seems longer overlap reduces
artifacts in the higher frequencies.
Finally, I do a 48=>44.1 test. (FFT base is now 960).
Filt Q9: 1.58e-05
FFT+160: 5.24e-07
FFT+320: 5.11e-07
FFT+480: 5.20e-07
Not bad, and for 48=>44.1, FFT+160 runs three times faster than Q0 (and
~50 times faster than Q9 ;))
So if you can survive a bit of latency, this should give you decent
results very quickly. (Do remember that latency though, it's not
insignificant).
There's still work to be done looking at aliasing; I'll need to use a
signal with frequencies closer to the nyquist to do that.
More information about the Speex-dev
mailing list