[opus] [RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics

Viswanath Puttagunta viswanath.puttagunta at linaro.org
Fri Nov 14 12:53:16 PST 2014


Hello,

I see from [1] Jean-Marc comment that opus project is open to adopting
ARM focussed fft optimizations in NE10 project/library at [2].

In light of this information, if and when fft implementations for 60,
120, 240, 480 become available in NE10 library,

a. Will the approach of enabling this optimization using
--enable-armv7-neon-float or similar flag be acceptable to begin with?

or

b. Should the "AS_IF([test x"${enable_asm}" = x"yes"]".. section in
configure.ac be re-written?

Please advise.

[1]: https://code.google.com/p/webrtc/issues/detail?id=3350&can=1&q=ne10&colspec=ID%20Pri%20Mstone%20ReleaseBlock%20Area%20Status%20Owner%20Summary
[2]: https://github.com/projectNe10/Ne10

Regards,
Vish

On 9 November 2014 15:34, Viswanath Puttagunta
<viswanath.puttagunta at linaro.org> wrote:
>
> Hello,
>
> This patch introduces ARM NEON Intrinsics to optimize
> kf_bfly4 routine in celt part of libopus.
>
> Using NEON optimized kf_bfly4(_neon) routine helped improve
> performance of opus_fft_impl function by about 21.4%. The
> end use case was decoding a music opus ogg file. The end
> use case saw performance improvement of about 4.47%.
>
> This patch has 2 components
> i. Actual neon code to improve kf_bfly4
> ii. Infrastructure to include neon intrinsics into this project
>
> I am reasonably confident about part "i" above.
> However, I need some direction with "ii".
>
> With this patch, users can explicitly enable neon intrinsics for
> SoCs that have ARMv7 NEON VFP support using --enable-armv7-neon-float
>
> I enabled this feature with minimal invasion of existing configure.ac
> and Makefile.am code base.
>
> I suspect having runtime function detection and seamless enablement
> of neon intrinsics without using --enable-armv7-neon-float will take
> more work and collaboration.
>
> Can we in the mean time take this patch as a starting point (ofcourse
> after due review). My idea is that once this patch gets accepted, work
> can go forward in two fronts *independently*.
>
> i. Optimizing more functions using neon intrinsics
> ii. Proper way to enable neon intrinsics in configure.ac,Makefile.am etc.
>
> More details on how I verified this patch and performance measurements
> is available at [1]
>
> Please let me know your thoughts.
>
> [1]: https://docs.google.com/document/d/1l_VWknKMdR_6nn1zIjaawxP2u7p4F3OAt7jBeuAyqe0/edit?usp=sharing
>
> Viswanath Puttagunta (1):
>   arm: kf_bfly4: Introduce ARM neon intrinsics
>
>  Makefile.am              |   16 ++++
>  celt/_kiss_fft_guts.h    |   13 +++
>  celt/arm/kiss_fft_neon.c |  211 ++++++++++++++++++++++++++++++++++++++++++++++
>  celt/arm/kiss_fft_neon.h |   37 ++++++++
>  celt/kiss_fft.c          |    2 +-
>  celt_headers.mk          |    1 +
>  celt_sources.mk          |    3 +
>  configure.ac             |   14 +++
>  8 files changed, 296 insertions(+), 1 deletion(-)
>  create mode 100644 celt/arm/kiss_fft_neon.c
>  create mode 100644 celt/arm/kiss_fft_neon.h
>
> --
> 1.7.9.5
>


More information about the opus mailing list