[opus] Optimizing on AMD Geode (MMX, no SSE)

Matteo Fortini matteo.fortini at gmail.com
Wed Jan 7 08:01:26 PST 2015

I'm trying to improve Opus on an AMD Geode CPU, which has limited SSE 
support (called 3DNow!), but MMX.

Without optimizations I can only encode 16 bit audio @16KHz with 
complexity up to 2-3 without underruns.

I tried compiling with SSE2/4 optimizations, but all I got was a crash 
with SIGILL, so I looked into optimized code and found that a good 
starting point was the dot product, so I inserted an MMX implementation 
of it, gaining a bit in performance.

Then I saw the xcorr function in its simplest form, which is looping and 
calculating dot products, and substituted the dot product with a call to 
the MMX version. This way I can go up to complexity 3-4 without underruns.

Since this is far from optimal, I was looking into other places that 
would get big benefits from parallelization.

Can you point out some? I was thinking about the FIR/IIR filter 
implementations, but I'm afraid the overhead of using MMX would offset 
the gain, since the filter is probably not so long.

Of course I can share the MMX code, even if it's still not cleanly 
incorporated in the source.

Thank you in advance,

More information about the opus mailing list