[speex-dev] libspeex/SSE Intrinsics with GCC 3.3.x
Steve Kann
stevek at stevek.com
Fri Apr 2 12:05:27 PST 2004
Having not looked at the sse stuff in the encoder/decoder proper yet, I
did some benchmarking with some gcc flags, based on what I wrote earlier.
I think that presently, if you compile with sse, you are requiring SSE
for your binary already, right? Therefore, you're requiring P3 or
Athlon or later..
In that case, then these additional flags will probably help alot.
Here's some benchmark results, obtained by using the given flags, and
doing timed test encodings of 3361 seconds of audio with a bandwidth
limit of 8kbps and complexity 1. These were run on an Athlon-XP, but my
experience is that the flags here help all of P3, P4, and Athlon.
-fsingle-precision-constant in particular, really really helps Pentium 4s.
Without SSE:
-O2
Encode/Decode = 31.9x / 146x realtime
-O3 -ffast-math -funroll-all-loops -march=pentium3 -fprefetch-loop-arrays -fsingle-precision-constant
Encode/Decode = 53.4x / 361x realtime
<p>With SSE:
-O3 -msse
Encode/Decode = 65x/386x
<p>-O3 -msse -ffast-math -funroll-all-loops -march=pentium3 -fprefetch-loop-arrays -fsingle-precision-constant:
Encode/Decode = 69x / 460x realtime.
<p>So, you still can get 4% (encode) and 20% decode speed improvements with
the extra optimizations. If you're already requiring sse, you might as
well see if your gcc supports these, and use them. If you're not
requiring sse, and you're using gcc, you might want to at least try the
other flags, except use -mcpu=pentium3 (or -march=pentium -mcpu=pentium3
or so), as that might still help.
The -fsingle-precision-constant stuff can be changed in the code itself
(and I think Jean-Marc is going to try this). Also, c99 and some
platforms have single-precision versions of trancendentals (logf, expf,
etc) that you might want to detect and use from autoconf. These led to
a 1.3% improvement in the preprocessor, and might also help the
encoder/decoder if it uses them. (you can just define
HAVE_SINGLEPREC_XXX and then do #define exp(a) expf(a) etc.).
With all these changes, at 8kbps speex is now only twice as expensive as
GSM encoding when using 8kbps and complexity 1 (on my machine, GSM does
146x/280x), but using default settings for speexenc is still much slower
for encoding 28x/409x, and using comparable bitrate (13000) and low
complexity (--bitrate 13000 --comp 1) I get 50x/436x.
-SteveK
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Speex-dev
mailing list