[opus] [Aarch64 00/11] Patches to enable Aarch64

Tue Nov 10 13:37:49 PST 2015

> On Nov 10, 2015, at 3:45 PM, John Ridges <jridges at masque.com> wrote:
> 
> Since you're already set up for benchmarks, I would ask if you could 
> benchmark the difference between using and not using the ARM64 inline 
> assembly. I believe the original justification on ARMv7 for the assembly 
> was the processor's panoply of multiply instructions and their long 
> cycle times. It seems to me that the ARM64 processor is much more like 
> an x86 one, where using a simpleminded C multiply gives just as good of 
> results. Inline assembly tends to hobble the compiler's optimizer, and 
> in ARM64's case, may actually be counterproductive.
> 
> The NEON code of course is valuable on all the ARM processors.

No, configuring my patchset with —disable-asm (which disables both my celt and silk inline assembly, patch 06/11 and 07/11) slows down encode by 2-3% and decode by 5-6% on fixed-point arm64 (without Ne10).

Note that my submission has many *fewer* inline assembly snippets for ARM64 than the ARMv7 code does.  The guy here at Vidyo who actually did this optimization work (Johnny Lee, whose work I’m just massaging into submittable form) found that many of the multiplies were indeed better as C, especially with (what’s now) the OPUS_FAST_INT64 test.