[opus] [RFC PATCH v1 0/2] Encode optimize using libNE10
viswanath.puttagunta at linaro.org
Tue Jan 20 09:37:22 PST 2015
I've been cooking up this patchset to integrate NE10 library into opus.
Current patchset focuses on encode use case mainly effecting performance of
clt_mdct_forward() and opus_fft() (for float only)
Glad to report the following on Encode use case:
(Measured on my Beaglebone Black Cortex-A8 board)
- Performance improvement for encode use case ~= 12.34% (Based on time -p data)
- Performance improvement in opus_fft() ~= 350% - 400% (Based on perf data)
Please see the evidence of above data measured,
test results for test_unit_mdct and test_unit_dft
and related references at .
I also have precompiled libNE10.so (ARMv7) along with headers available for convinience at 
Known issues that need to be sorted out with NE10 team at ARM
- NE10 library needs to be compiled with -funsafe-math-optimizations for ARMv7. See  for more info
Note that I used -funsafe-math-optimizations to build libNE10.so available at  for all measurements.
Phil Wang at NE10 is looking into integrating this change at the moment.
without this change, you will see performance regression instead of improvement for ARMv7.
- Compile time and link time warnings with NE10. Will sort them out with NE10 library team.
I think current patchset is in a decent shape to request comments.
There are further optimizations that can be done.. but I wanted to
first post what I have so far and receive feedback before I spend
any more time so as to not head in the wrong direction.
Thanks in advance for your review.
Viswanath Puttagunta (2):
Optimize repeated calls to opus_select_arch
armv7(float): Optimize encode usecase using NE10 library
Makefile.am | 30 +--
celt/arm/arm_celt_ne10_fft_map.c | 65 ++++++
celt/arm/arm_celt_ne10_mdct_map.c | 53 +++++
celt/arm/armcpu.c | 19 +-
celt/arm/celt_ne10_fft.c | 101 ++++++++++
celt/arm/celt_ne10_mdct.c | 159 +++++++++++++++
celt/arm/fft_arm.h | 65 ++++++
celt/arm/mdct_arm.h | 52 +++++
celt/celt_encoder.c | 4 +-
celt/dump_modes/Makefile | 21 +-
celt/dump_modes/dump_mode_arm_ne10.c | 103 ++++++++++
celt/dump_modes/dump_modes.c | 22 +-
celt/dump_modes/dump_modes_arch.h | 14 ++
celt/kiss_fft.c | 18 +-
celt/kiss_fft.h | 44 +++-
celt/mdct.c | 2 +-
celt/mdct.h | 29 ++-
celt/static_modes_float.h | 25 +++
celt/static_modes_float_arm_ne10.h | 367 ++++++++++++++++++++++++++++++++++
celt/tests/test_unit_dft.c | 14 +-
celt/tests/test_unit_mdct.c | 19 +-
celt/x86/x86cpu.c | 22 +-
celt_headers.mk | 3 +
celt_sources.mk | 6 +
configure.ac | 81 ++++++++
src/analysis.c | 2 +-
src/opus_multistream_encoder.c | 3 +-
27 files changed, 1307 insertions(+), 36 deletions(-)
create mode 100644 celt/arm/arm_celt_ne10_fft_map.c
create mode 100644 celt/arm/arm_celt_ne10_mdct_map.c
create mode 100644 celt/arm/celt_ne10_fft.c
create mode 100644 celt/arm/celt_ne10_mdct.c
create mode 100644 celt/arm/fft_arm.h
create mode 100644 celt/arm/mdct_arm.h
create mode 100644 celt/dump_modes/dump_mode_arm_ne10.c
create mode 100644 celt/dump_modes/dump_modes_arch.h
create mode 100644 celt/static_modes_float_arm_ne10.h
More information about the opus