[opus] [RFC PATCH v1 0/2] Encode optimize using libNE10
Viswanath Puttagunta
viswanath.puttagunta at linaro.org
Tue Jan 20 09:37:22 PST 2015
Hello opus-dev,
I've been cooking up this patchset to integrate NE10 library into opus.
Current patchset focuses on encode use case mainly effecting performance of
clt_mdct_forward() and opus_fft() (for float only)
Glad to report the following on Encode use case:
(Measured on my Beaglebone Black Cortex-A8 board)
- Performance improvement for encode use case ~= 12.34% (Based on time -p data)
- Performance improvement in opus_fft() ~= 350% - 400% (Based on perf data)
Please see the evidence of above data measured,
test results for test_unit_mdct and test_unit_dft
and related references at [1].
I also have precompiled libNE10.so (ARMv7) along with headers available for convinience at [3]
Known issues that need to be sorted out with NE10 team at ARM
- NE10 library needs to be compiled with -funsafe-math-optimizations for ARMv7. See [2] for more info
Note that I used -funsafe-math-optimizations to build libNE10.so available at [3] for all measurements.
Phil Wang at NE10 is looking into integrating this change at the moment.
without this change, you will see performance regression instead of improvement for ARMv7.
- Compile time and link time warnings with NE10. Will sort them out with NE10 library team.
I think current patchset is in a decent shape to request comments.
There are further optimizations that can be done.. but I wanted to
first post what I have so far and receive feedback before I spend
any more time so as to not head in the wrong direction.
Thanks in advance for your review.
Regards,
Vish
[1]: https://docs.google.com/a/linaro.org/document/d/1avz20b3DOnD3IwxiKTmUfyUK89hUwL9K2PYMh7dlkNg/edit#
[2]: https://bugs.linaro.org/show_bug.cgi?id=1044
[3]: http://people.linaro.org/~viswanath.puttagunta/opus/NE10_root/NE10_root.tar.gz
Viswanath Puttagunta (2):
Optimize repeated calls to opus_select_arch
armv7(float): Optimize encode usecase using NE10 library
Makefile.am | 30 +--
celt/arm/arm_celt_ne10_fft_map.c | 65 ++++++
celt/arm/arm_celt_ne10_mdct_map.c | 53 +++++
celt/arm/armcpu.c | 19 +-
celt/arm/celt_ne10_fft.c | 101 ++++++++++
celt/arm/celt_ne10_mdct.c | 159 +++++++++++++++
celt/arm/fft_arm.h | 65 ++++++
celt/arm/mdct_arm.h | 52 +++++
celt/celt_encoder.c | 4 +-
celt/dump_modes/Makefile | 21 +-
celt/dump_modes/dump_mode_arm_ne10.c | 103 ++++++++++
celt/dump_modes/dump_modes.c | 22 +-
celt/dump_modes/dump_modes_arch.h | 14 ++
celt/kiss_fft.c | 18 +-
celt/kiss_fft.h | 44 +++-
celt/mdct.c | 2 +-
celt/mdct.h | 29 ++-
celt/static_modes_float.h | 25 +++
celt/static_modes_float_arm_ne10.h | 367 ++++++++++++++++++++++++++++++++++
celt/tests/test_unit_dft.c | 14 +-
celt/tests/test_unit_mdct.c | 19 +-
celt/x86/x86cpu.c | 22 +-
celt_headers.mk | 3 +
celt_sources.mk | 6 +
configure.ac | 81 ++++++++
src/analysis.c | 2 +-
src/opus_multistream_encoder.c | 3 +-
27 files changed, 1307 insertions(+), 36 deletions(-)
create mode 100644 celt/arm/arm_celt_ne10_fft_map.c
create mode 100644 celt/arm/arm_celt_ne10_mdct_map.c
create mode 100644 celt/arm/celt_ne10_fft.c
create mode 100644 celt/arm/celt_ne10_mdct.c
create mode 100644 celt/arm/fft_arm.h
create mode 100644 celt/arm/mdct_arm.h
create mode 100644 celt/dump_modes/dump_mode_arm_ne10.c
create mode 100644 celt/dump_modes/dump_modes_arch.h
create mode 100644 celt/static_modes_float_arm_ne10.h
--
1.7.9.5
More information about the opus
mailing list