[opus] [PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON
Timothy B. Terriberry
tterribe at xiph.org
Wed Sep 28 01:42:49 UTC 2016
Linfeng Zhang wrote:
> +#ifdef SMALL_FOOTPRINT
> + for (i=0;i<N-7;i+=8)
> + {
> [snip over 80 lines of complicated NEON intrinsics code]
> + }
> +#else
So, one of the points of SMALL_FOOTPRINT is to reduce the code size on
targets where this matters (even if it means running slower), but this
is an awful lot of code.
I think it makes much more sense to expose the existing xcorr_kernel asm
and use that. I wrote a simple patch demonstrating this (attached... it
applies on top of your full series, so it'd be a little work to rebase
it into place here). It adds one 16-byte table and 16 instructions, and
even gives speed-ups on non-NEON CPUs by reusing the existing EDSP asm.
Testing on comp48-stereo.sw encoded to 64 kbps and decoded with a 15%
loss rate on a Novena using opus_demo (by using RTCD and changing the
function pointers to the version of the code to test), optimizing
xcorr_kernel gives almost as much speed-up as intrinsics for all of
celt_fir:
celt_fir_c, xcorr_kernel_c:
1753 ms (stddev 9) [1730 1740 {1740 1740 1740 1750 1750 1750 1750 1750
1750 1750 1750 1750 1750 1750 1760 1760 1760 1760 1770 1770} 1780 1860]
celt_fir_c, xcorr_kernel_neon:
1710 ms (stddev 12) [1680 1690 {1690 1690 1700 1700 1700 1700 1710 1710
1710 1710 1710 1710 1710 1710 1710 1720 1720 1730 1730 1730} 1740 1810]
celt_fir_neon:
1695 ms (stddev 9) [1670 1680 {1680 1680 1680 1690 1690 1690 1690 1690
1690 1690 1700 1700 1700 1700 1700 1700 1700 1700 1710 1710} 1720 1790]
It might even be enough to use this for the non-SMALL_FOOTPRINT case.
What do you think?
More information about the opus
mailing list