[opus] [Aarch64 00/11] Patches to enable Aarch64 (arm64) optimizations, rebased to current master.
jonathan at vidyo.com
Tue Nov 10 11:32:35 PST 2015
> On Nov 6, 2015, at 9:05 PM, Jonathan Lennox <jonathan at vidyo.com> wrote:
> These have been tested for correctness under qemu (including running
> the test vectors), but not yet performance tested on a live aarch64
> CPU (which will probably be an iPhone). I should be able to do this
> Monday or Tuesday.
I’ve now done this, on an iPhone 5S. (Building with clang from Xcode 7.1)
In fixed-point mode, relative to current HEAD of master, in my tests aarch64 gets an 10-12% encode boost, and a 6-7% decode boost, without Ne10. With Ne10, it’s an 11-13% encode boost, and a 14-15% decode boost. (Current HEAD of master doesn’t use Ne10 on aarch64 at all.)
There’s also about a 5-6% boost to aarch64 floating-point mode, since some of the optimizations apply to both fixed and float code.
Fixed-point mode is still substantially faster than floating-point (about 20% faster for encode, about 10% faster for decode.)
These patches also speed armv7 up substantially, since a number of the Neon intrinsics apply to armv7 as well.
Any questions, feel free to ask me or ping me on #opus.
More information about the opus