[opus] [Aarch64 00/11] Patches to enable Aarch64 (arm64) optimizations, rebased to current master.

Tue Nov 10 11:32:35 PST 2015

> On Nov 6, 2015, at 9:05 PM, Jonathan Lennox <jonathan at vidyo.com> wrote:
> 
> These have been tested for correctness under qemu (including running
> the test vectors), but not yet performance tested on a live aarch64
> CPU (which will probably be an iPhone).  I should be able to do this
> Monday or Tuesday.

I’ve now done this, on an iPhone 5S.  (Building with clang from Xcode 7.1)

In fixed-point mode, relative to current HEAD of master, in my tests aarch64 gets an 10-12% encode boost, and a 6-7% decode boost, without Ne10.  With Ne10, it’s an 11-13% encode boost, and a 14-15% decode boost. (Current HEAD of master doesn’t use Ne10 on aarch64 at all.)

There’s also about a 5-6% boost to aarch64 floating-point mode, since some of the optimizations apply to both fixed and float code.

Fixed-point mode is still substantially faster than floating-point (about 20% faster for encode, about 10% faster for decode.)

These patches also speed armv7 up substantially, since a number of the Neon intrinsics apply to armv7 as well.

Any questions, feel free to ask me or ping me on #opus.