[opus] [PATCH] 02-Add CELT filter optimizations
Timothy B. Terriberry
tterribe at xiph.org
Tue May 21 13:54:30 PDT 2013
Aurélien Zanelli wrote:
> Please ignore my previous mail and patch, there is a new version :).
>
> Patch changes are:
> - Use MAC16_16 macros instead of (sum += a*b) and unroll a loop by 2. It
> increase performance when using optimized macros (ex: ARMv5E). A
> possible side effect of loop unroll is that i don't check for odd length
> here.
> - Add NEON version of FIR filter and autocorr
> - Add a section in autoconf in order to check NEON support
As Peter Robinson pointed out, we need runtime CPU detection for NEON.
Even if we know at compile time that we're targeting ARMv7, some chips
have NEON and some don't, and Debian, Android apps, Firefox, etc., all
need a single build that runs on both.
We did some design discussion in #opus this morning. The short-term plan
is to port over the libtheora ARM CPU detection code. Instead of having
function tables in the state structs, however, the plan is to use an
index into a read-only list of functions, so e.g.,
ptr_funcs[st->arch&ARCH_MASK] can select one without the risk of buffer
overflows corrupting st leading to arbitrary code execution.
If you want to start implementing that, let me know, otherwise I'll take
a crack at it.
Also, when replacing whole functions, I think we should use separate
RVCT-syntax assembly files instead of inline asm, for portability. We
can translate to gas-syntax with a simple Perl script (libtheora and
libvpx use this strategy).
More information about the opus
mailing list