[opus] [PATCH] 02-Add CELT filter optimizations

Timothy B. Terriberry tterribe at xiph.org
Tue May 21 13:54:30 PDT 2013

Aurélien Zanelli wrote:
> Please ignore my previous mail and patch, there is a new version :).
> Patch changes are:
> - Use MAC16_16 macros instead of (sum += a*b) and unroll a loop by 2. It
> increase performance when using optimized macros (ex: ARMv5E). A
> possible side effect of loop unroll is that i don't check for odd length
> here.
> - Add NEON version of FIR filter and autocorr
> - Add a section in autoconf in order to check NEON support

As Peter Robinson pointed out, we need runtime CPU detection for NEON. 
Even if we know at compile time that we're targeting ARMv7, some chips 
have NEON and some don't, and Debian, Android apps, Firefox, etc., all 
need a single build that runs on both.

We did some design discussion in #opus this morning. The short-term plan 
is to port over the libtheora ARM CPU detection code. Instead of having 
function tables in the state structs, however, the plan is to use an 
index into a read-only list of functions, so e.g., 
ptr_funcs[st->arch&ARCH_MASK] can select one without the risk of buffer 
overflows corrupting st leading to arbitrary code execution.

If you want to start implementing that, let me know, otherwise I'll take 
a crack at it.

Also, when replacing whole functions, I think we should use separate 
RVCT-syntax assembly files instead of inline asm, for portability. We 
can translate to gas-syntax with a simple Perl script (libtheora and 
libvpx use this strategy).

More information about the opus mailing list