[opus] [RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics

Viswanath Puttagunta viswanath.puttagunta at linaro.org
Sun Nov 9 13:34:26 PST 2014


This patch introduces ARM NEON Intrinsics to optimize
kf_bfly4 routine in celt part of libopus.

Using NEON optimized kf_bfly4(_neon) routine helped improve
performance of opus_fft_impl function by about 21.4%. The
end use case was decoding a music opus ogg file. The end
use case saw performance improvement of about 4.47%.

This patch has 2 components
i. Actual neon code to improve kf_bfly4
ii. Infrastructure to include neon intrinsics into this project

I am reasonably confident about part "i" above.
However, I need some direction with "ii".

With this patch, users can explicitly enable neon intrinsics for
SoCs that have ARMv7 NEON VFP support using --enable-armv7-neon-float 

I enabled this feature with minimal invasion of existing configure.ac
and Makefile.am code base.

I suspect having runtime function detection and seamless enablement
of neon intrinsics without using --enable-armv7-neon-float will take
more work and collaboration.

Can we in the mean time take this patch as a starting point (ofcourse
after due review). My idea is that once this patch gets accepted, work
can go forward in two fronts *independently*.

i. Optimizing more functions using neon intrinsics
ii. Proper way to enable neon intrinsics in configure.ac,Makefile.am etc. 

More details on how I verified this patch and performance measurements
is available at [1]

Please let me know your thoughts.

[1]: https://docs.google.com/document/d/1l_VWknKMdR_6nn1zIjaawxP2u7p4F3OAt7jBeuAyqe0/edit?usp=sharing

Viswanath Puttagunta (1):
  arm: kf_bfly4: Introduce ARM neon intrinsics

 Makefile.am              |   16 ++++
 celt/_kiss_fft_guts.h    |   13 +++
 celt/arm/kiss_fft_neon.c |  211 ++++++++++++++++++++++++++++++++++++++++++++++
 celt/arm/kiss_fft_neon.h |   37 ++++++++
 celt/kiss_fft.c          |    2 +-
 celt_headers.mk          |    1 +
 celt_sources.mk          |    3 +
 configure.ac             |   14 +++
 8 files changed, 296 insertions(+), 1 deletion(-)
 create mode 100644 celt/arm/kiss_fft_neon.c
 create mode 100644 celt/arm/kiss_fft_neon.h


