[opus] [RFC PATCH v1 0/2] Encode optimize using libNE10

Viswanath Puttagunta viswanath.puttagunta at linaro.org
Tue Jan 20 09:37:22 PST 2015


Hello opus-dev,

I've been cooking up this patchset to integrate NE10 library into opus.
Current patchset focuses on encode use case mainly effecting performance of
clt_mdct_forward() and opus_fft() (for float only)

Glad to report the following on Encode use case:
(Measured on my Beaglebone Black Cortex-A8 board)
- Performance improvement for encode use case ~= 12.34%    (Based on time -p data)
- Performance improvement in opus_fft()      ~= 350% - 400% (Based on perf data)

Please see the evidence of above data measured, 
test results for test_unit_mdct and test_unit_dft 
and related references at [1].

I also have precompiled libNE10.so (ARMv7) along with headers available for convinience at [3]

Known issues that need to be sorted out with NE10 team at ARM
- NE10 library needs to be compiled with -funsafe-math-optimizations for ARMv7. See [2] for more info
  Note that I used -funsafe-math-optimizations to build libNE10.so available at [3] for all measurements. 
  Phil Wang at NE10 is looking into integrating this change at the moment.
  without this change, you will see performance regression instead of improvement for ARMv7.
- Compile time and link time warnings with NE10. Will sort them out with NE10 library team.

I think current patchset is in a decent shape to request comments.
There are further optimizations that can be done.. but I wanted to
first post what I have so far and receive feedback before I spend
any more time so as to not head in the wrong direction.


Thanks in advance for your review.

Regards,
Vish

[1]: https://docs.google.com/a/linaro.org/document/d/1avz20b3DOnD3IwxiKTmUfyUK89hUwL9K2PYMh7dlkNg/edit#
[2]: https://bugs.linaro.org/show_bug.cgi?id=1044
[3]: http://people.linaro.org/~viswanath.puttagunta/opus/NE10_root/NE10_root.tar.gz

Viswanath Puttagunta (2):
  Optimize repeated calls to opus_select_arch
  armv7(float): Optimize encode usecase using NE10 library

 Makefile.am                          |   30 +--
 celt/arm/arm_celt_ne10_fft_map.c     |   65 ++++++
 celt/arm/arm_celt_ne10_mdct_map.c    |   53 +++++
 celt/arm/armcpu.c                    |   19 +-
 celt/arm/celt_ne10_fft.c             |  101 ++++++++++
 celt/arm/celt_ne10_mdct.c            |  159 +++++++++++++++
 celt/arm/fft_arm.h                   |   65 ++++++
 celt/arm/mdct_arm.h                  |   52 +++++
 celt/celt_encoder.c                  |    4 +-
 celt/dump_modes/Makefile             |   21 +-
 celt/dump_modes/dump_mode_arm_ne10.c |  103 ++++++++++
 celt/dump_modes/dump_modes.c         |   22 +-
 celt/dump_modes/dump_modes_arch.h    |   14 ++
 celt/kiss_fft.c                      |   18 +-
 celt/kiss_fft.h                      |   44 +++-
 celt/mdct.c                          |    2 +-
 celt/mdct.h                          |   29 ++-
 celt/static_modes_float.h            |   25 +++
 celt/static_modes_float_arm_ne10.h   |  367 ++++++++++++++++++++++++++++++++++
 celt/tests/test_unit_dft.c           |   14 +-
 celt/tests/test_unit_mdct.c          |   19 +-
 celt/x86/x86cpu.c                    |   22 +-
 celt_headers.mk                      |    3 +
 celt_sources.mk                      |    6 +
 configure.ac                         |   81 ++++++++
 src/analysis.c                       |    2 +-
 src/opus_multistream_encoder.c       |    3 +-
 27 files changed, 1307 insertions(+), 36 deletions(-)
 create mode 100644 celt/arm/arm_celt_ne10_fft_map.c
 create mode 100644 celt/arm/arm_celt_ne10_mdct_map.c
 create mode 100644 celt/arm/celt_ne10_fft.c
 create mode 100644 celt/arm/celt_ne10_mdct.c
 create mode 100644 celt/arm/fft_arm.h
 create mode 100644 celt/arm/mdct_arm.h
 create mode 100644 celt/dump_modes/dump_mode_arm_ne10.c
 create mode 100644 celt/dump_modes/dump_modes_arch.h
 create mode 100644 celt/static_modes_float_arm_ne10.h

-- 
1.7.9.5



More information about the opus mailing list