[speex-dev] libspeex/SSE Intrinsics with GCC 3.3.x

Steve Kann stevek at stevek.com
Fri Apr 2 12:05:27 PST 2004

Having not looked at the sse stuff in the encoder/decoder proper yet, I 
did some benchmarking with some gcc flags, based on what I wrote earlier.

I think that presently, if you compile with sse, you are requiring SSE 
for your binary already, right?  Therefore, you're requiring P3 or 
Athlon or later..

In that case, then these additional flags will probably help alot.  
Here's some benchmark results, obtained by using the given flags, and 
doing timed test encodings of 3361 seconds of audio with a bandwidth 
limit of 8kbps and complexity 1. These were run on an Athlon-XP, but my 
experience is that the flags here help all of P3, P4, and Athlon.  
-fsingle-precision-constant in particular, really really helps Pentium 4s.

Without SSE:

Encode/Decode = 31.9x / 146x realtime

-O3 -ffast-math -funroll-all-loops -march=pentium3 -fprefetch-loop-arrays -fsingle-precision-constant
Encode/Decode = 53.4x / 361x realtime

<p>With SSE:

-O3 -msse
Encode/Decode = 65x/386x

<p>-O3 -msse -ffast-math -funroll-all-loops -march=pentium3 -fprefetch-loop-arrays -fsingle-precision-constant:
Encode/Decode = 69x / 460x realtime.

<p>So, you still can get 4% (encode) and 20% decode speed improvements with 
the extra optimizations.  If you're already requiring sse, you might as 
well see if your gcc supports these, and use them.  If you're not 
requiring sse, and you're using gcc, you might want to at least try the 
other flags, except use -mcpu=pentium3 (or -march=pentium -mcpu=pentium3 
or so), as that might still help.

The -fsingle-precision-constant stuff can be changed in the code itself 
(and I think Jean-Marc is going to try this).  Also, c99 and some 
platforms have single-precision versions of trancendentals (logf, expf, 
etc) that you might want to detect and use from autoconf.  These led to 
a 1.3% improvement in the preprocessor, and might also help the 
encoder/decoder if it uses them.   (you can just define 
HAVE_SINGLEPREC_XXX and then do #define exp(a)    expf(a) etc.).

With all these changes, at 8kbps speex is now only twice as expensive as 
GSM encoding when using 8kbps and complexity 1  (on my machine, GSM does 
146x/280x), but using default settings for speexenc is still much slower 
for encoding 28x/409x, and using comparable bitrate (13000) and low 
complexity (--bitrate 13000 --comp 1) I get 50x/436x.


--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.

More information about the Speex-dev mailing list