[opus] [RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
viswanath.puttagunta at linaro.org
Sun Nov 9 13:34:26 PST 2014
This patch introduces ARM NEON Intrinsics to optimize
kf_bfly4 routine in celt part of libopus.
Using NEON optimized kf_bfly4(_neon) routine helped improve
performance of opus_fft_impl function by about 21.4%. The
end use case was decoding a music opus ogg file. The end
use case saw performance improvement of about 4.47%.
This patch has 2 components
i. Actual neon code to improve kf_bfly4
ii. Infrastructure to include neon intrinsics into this project
I am reasonably confident about part "i" above.
However, I need some direction with "ii".
With this patch, users can explicitly enable neon intrinsics for
SoCs that have ARMv7 NEON VFP support using --enable-armv7-neon-float
I enabled this feature with minimal invasion of existing configure.ac
and Makefile.am code base.
I suspect having runtime function detection and seamless enablement
of neon intrinsics without using --enable-armv7-neon-float will take
more work and collaboration.
Can we in the mean time take this patch as a starting point (ofcourse
after due review). My idea is that once this patch gets accepted, work
can go forward in two fronts *independently*.
i. Optimizing more functions using neon intrinsics
ii. Proper way to enable neon intrinsics in configure.ac,Makefile.am etc.
More details on how I verified this patch and performance measurements
is available at 
Please let me know your thoughts.
Viswanath Puttagunta (1):
arm: kf_bfly4: Introduce ARM neon intrinsics
Makefile.am | 16 ++++
celt/_kiss_fft_guts.h | 13 +++
celt/arm/kiss_fft_neon.c | 211 ++++++++++++++++++++++++++++++++++++++++++++++
celt/arm/kiss_fft_neon.h | 37 ++++++++
celt/kiss_fft.c | 2 +-
celt_headers.mk | 1 +
celt_sources.mk | 3 +
configure.ac | 14 +++
8 files changed, 296 insertions(+), 1 deletion(-)
create mode 100644 celt/arm/kiss_fft_neon.c
create mode 100644 celt/arm/kiss_fft_neon.h
More information about the opus