[opus] [PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON

Timothy B. Terriberry tterribe at xiph.org
Wed Sep 28 01:42:49 UTC 2016


Linfeng Zhang wrote:
> +#ifdef SMALL_FOOTPRINT
> +   for (i=0;i<N-7;i+=8)
> +   {
> [snip over 80 lines of complicated NEON intrinsics code]
> +   }
> +#else

So, one of the points of SMALL_FOOTPRINT is to reduce the code size on 
targets where this matters (even if it means running slower), but this 
is an awful lot of code.

I think it makes much more sense to expose the existing xcorr_kernel asm 
and use that. I wrote a simple patch demonstrating this (attached... it 
applies on top of your full series, so it'd be a little work to rebase 
it into place here). It adds one 16-byte table and 16 instructions, and 
even gives speed-ups on non-NEON CPUs by reusing the existing EDSP asm.

Testing on comp48-stereo.sw encoded to 64 kbps and decoded with a 15% 
loss rate on a Novena using opus_demo (by using RTCD and changing the 
function pointers to the version of the code to test), optimizing 
xcorr_kernel gives almost as much speed-up as intrinsics for all of 
celt_fir:

celt_fir_c, xcorr_kernel_c:
1753 ms (stddev 9) [1730 1740 {1740 1740 1740 1750 1750 1750 1750 1750 
1750 1750 1750 1750 1750 1750 1760 1760 1760 1760 1770 1770} 1780 1860]

celt_fir_c, xcorr_kernel_neon:
1710 ms (stddev 12) [1680 1690 {1690 1690 1700 1700 1700 1700 1710 1710 
1710 1710 1710 1710 1710 1710 1710 1720 1720 1730 1730 1730} 1740 1810]

celt_fir_neon:
1695 ms (stddev 9) [1670 1680 {1680 1680 1680 1690 1690 1690 1690 1690 
1690 1690 1700 1700 1700 1700 1700 1700 1700 1700 1710 1710} 1720 1790]

It might even be enough to use this for the non-SMALL_FOOTPRINT case. 
What do you think?


More information about the opus mailing list