[opus] [Aarch64 00/11] Patches to enable Aarch64

Tue Nov 10 12:45:10 PST 2015

Since you're already set up for benchmarks, I would ask if you could 
benchmark the difference between using and not using the ARM64 inline 
assembly. I believe the original justification on ARMv7 for the assembly 
was the processor's panoply of multiply instructions and their long 
cycle times. It seems to me that the ARM64 processor is much more like 
an x86 one, where using a simpleminded C multiply gives just as good of 
results. Inline assembly tends to hobble the compiler's optimizer, and 
in ARM64's case, may actually be counterproductive.

The NEON code of course is valuable on all the ARM processors.

On 11/10/2015 1:00 PM, opus-request at xiph.org wrote:
> Send opus mailing list submissions to
> 	opus at xiph.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://lists.xiph.org/mailman/listinfo/opus
> or, via email, send a message with subject or body 'help' to
> 	opus-request at xiph.org
>
> You can reach the person managing the list at
> 	opus-owner at xiph.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of opus digest..."
>
>
> Today's Topics:
>
>     1. Re: [Aarch64 00/11] Patches to enable Aarch64	(arm64)
>        optimizations, rebased to current master. (Jonathan Lennox)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 10 Nov 2015 19:32:35 +0000
> From: Jonathan Lennox <jonathan at vidyo.com>
> Subject: Re: [opus] [Aarch64 00/11] Patches to enable Aarch64	(arm64)
> 	optimizations, rebased to current master.
> To: "opus at xiph.org" <opus at xiph.org>
> Message-ID: <A0373653-FF01-472A-AC31-A68348384BF2 at vidyo.com>
> Content-Type: text/plain; charset="utf-8"
>
>
>> On Nov 6, 2015, at 9:05 PM, Jonathan Lennox <jonathan at vidyo.com> wrote:
>>
>> These have been tested for correctness under qemu (including running
>> the test vectors), but not yet performance tested on a live aarch64
>> CPU (which will probably be an iPhone).  I should be able to do this
>> Monday or Tuesday.
> I?ve now done this, on an iPhone 5S.  (Building with clang from Xcode 7.1)
>
> In fixed-point mode, relative to current HEAD of master, in my tests aarch64 gets an 10-12% encode boost, and a 6-7% decode boost, without Ne10.  With Ne10, it?s an 11-13% encode boost, and a 14-15% decode boost. (Current HEAD of master doesn?t use Ne10 on aarch64 at all.)
>
> There?s also about a 5-6% boost to aarch64 floating-point mode, since some of the optimizations apply to both fixed and float code.
>
> Fixed-point mode is still substantially faster than floating-point (about 20% faster for encode, about 10% faster for decode.)
>
> These patches also speed armv7 up substantially, since a number of the Neon intrinsics apply to armv7 as well.
>
> Any questions, feel free to ask me or ping me on #opus.
>
> ------------------------------
>
> _______________________________________________
> opus mailing list
> opus at xiph.org
> http://lists.xiph.org/mailman/listinfo/opus
>
>
> End of opus Digest, Vol 82, Issue 15
> ************************************
>