[Speex-dev] MKL Patch

Tue May 27 15:52:39 PDT 2008

I did some benchmarking. On a P4 Xeon 3.4 Ghz, 64-bit, GCC 4.3, -O3 
-fprofile-use -ffast-math -ftree-vectorize, running the preprocessor 
with 320 sample frames (16khz):

KISS: 89us / frame
Small: 88us / frame
FFTW3: 76 us / frame
MKL: 75 us / frame

According to callgrind, with MKL 18.4% of CPU time is spent in the FFTs 
vs 37.9% with Small, meaning those previous numbers are for a different 
architecture, compiler or optimization flags.

So, in reality, we're saving only 20% off the preprocessor time. Looking 
at the output, you can shave 50% off what remains if you SSE optimize 
the entire code (and I'm not doing that ;))