[opus] [RFC PATCH v1 0/8] Ne10 fft fixed and previous
Viswanath Puttagunta
viswanath.puttagunta at linaro.org
Tue Apr 28 15:24:48 PDT 2015
Hello Timothy / Jean-Marc / opus-dev,
This patch series is follow up on work I posted on [1].
In addition to what was posted on [1], this patch series mainly
integrates Fixed point FFT implementations in NE10 library into opus.
You can view my opus wip code at [2].
Note that while I found some issues both with the NE10 library(fixed fft)
and with Linaro toolchain (armv8 intrinsics), the work related to opus is complete,
meaning that these patches are functional and I am able to
successfully encode and decode comp48-stereo.sw [4] using
commands
./opus_demo -e restricted-lowdelay 48000 2 96000 comp48-stereo.sw xcorr.opus
./opus_demo -d 48000 2 xcorr.opus <output>
and able to hear the audio clearly.
Below are some issues *OUTSIDE* opus project I discovered:
- Issues with Linaro Toolchain 14.11 (based on gcc 4.9) at [5]
- Note that I found no issues for ARMv7 with toolchain
- However, I found that the code that the compiler generates
for neon intrinsics for ARMv8 is not very optimal.
- This applies to functions in celt/arm/celt_neon_intr.c.
- I am working with our Linaro Toolchain team to fix the issues.
- Issues discovered with NE10 library
- No issues found for ARMv8 floating point fft.
- Performance for fixed point is not as good as expected
for both ARMv7 and ARMv8
- Even though audio is clearly audible after encode/decode
test_unit_mdct fails for nfft=480 and it's multiples 960, 1920.
Note this failure is only for mdct_forward. Not mdct_backward.
- This was surprising to me because test_unit_dft passes for all
nfft including 60, 120, 240, 480. May be there are some data
corner cases that need further investigation.
- I am working with ARM to resolve these issues
- Again, note that Ne10 source code I used is at [3].
- Pre-compiled Ne10 libraries available at [6]
NOTE:
I really think issues I discovered with toolchain and NE10 are not blockers
for integrating these patches into libopus. I kindly request you to review and
merge my patches at the earliest.
[1]: http://lists.xiph.org/pipermail/opus/2015-March/002947.html
[2]: https://git.linaro.org/people/viswanath.puttagunta/opus.git/shortlog/refs/heads/rfcv1_rc2_fft_fixed
[3]: https://git.linaro.org/people/viswanath.puttagunta/Ne10.git/shortlog/refs/heads/fft-fixed
[4]: https://people.xiph.org/~tterribe/opus/comp48-stereo.sw
[5]: http://releases.linaro.org/14.11/components/toolchain/binaries
[6]: http://people.linaro.org/~viswanath.puttagunta/opus/NE10_root/
Jonathan Lennox (1):
Intrinsics/RTCD related fixes. Mostly x86
Viswanath Puttagunta (7):
armv7(float): Optimize encode usecase using NE10 library
armv7(float): Optimize decode usecase using NE10 library
aarch64: Enable intrinsics for aarch64
aarch64: celt_pitch_xcorr: Fixed point intrinsics
armv7,armv8: Optimize fixed point fft using NE10 library
armv7,armv8: Extend fixed fft NE10 optimizations to mdct
test_unit_dft: Add nfft = 60, 240, 480 tests
Makefile.am | 84 ++++---
celt/arm/arm_celt_map.c | 71 +++++-
celt/arm/armcpu.c | 6 +-
celt/arm/celt_ne10_fft.c | 174 +++++++++++++
celt/arm/celt_ne10_mdct.c | 260 ++++++++++++++++++++
celt/arm/celt_neon_intr.c | 275 +++++++++++++++++++++
celt/arm/fft_arm.h | 72 ++++++
celt/arm/mdct_arm.h | 58 +++++
celt/arm/pitch_arm.h | 14 +-
celt/bands.c | 6 +-
celt/celt.c | 16 +-
celt/celt.h | 12 +-
celt/celt_decoder.c | 24 +-
celt/celt_encoder.c | 20 +-
celt/celt_lpc.h | 2 +-
celt/cpu_support.h | 15 +-
celt/dump_modes/Makefile | 24 +-
celt/dump_modes/dump_modes.c | 21 ++
celt/dump_modes/dump_modes_arch.h | 47 ++++
celt/dump_modes/dump_modes_arm_ne10.c | 144 +++++++++++
celt/kiss_fft.c | 31 ++-
celt/kiss_fft.h | 67 ++++-
celt/mdct.c | 20 +-
celt/mdct.h | 60 ++++-
celt/mips/celt_mipsr1.h | 2 +-
celt/modes.c | 8 +-
celt/pitch.c | 4 +-
celt/pitch.h | 22 +-
celt/static_modes_fixed.h | 25 ++
celt/static_modes_fixed_arm_ne10.h | 388 +++++++++++++++++++++++++++++
celt/static_modes_float.h | 25 ++
celt/static_modes_float_arm_ne10.h | 404 +++++++++++++++++++++++++++++++
celt/tests/test_unit_dft.c | 62 +++--
celt/tests/test_unit_mathops.c | 22 +-
celt/tests/test_unit_mdct.c | 88 ++++---
celt/tests/test_unit_rotation.c | 22 +-
celt/x86/celt_lpc_sse.c | 4 +
celt/x86/celt_lpc_sse.h | 12 +-
celt/x86/pitch_sse.c | 334 ++++++++++---------------
celt/x86/pitch_sse.h | 256 ++++++++------------
celt/x86/pitch_sse2.c | 95 ++++++++
celt/x86/pitch_sse4_1.c | 195 +++++++++++++++
celt/x86/x86_celt_map.c | 76 +++++-
celt/x86/x86cpu.c | 47 +++-
celt/x86/x86cpu.h | 26 +-
celt_headers.mk | 4 +
celt_sources.mk | 9 +-
configure.ac | 391 +++++++++++++++++++++---------
m4/opus-intrinsics.m4 | 29 +++
silk/x86/SigProc_FIX_sse.h | 17 ++
silk/x86/main_sse.h | 48 ++++
silk/x86/x86_silk_map.c | 25 +-
src/analysis.c | 8 +-
src/analysis.h | 2 +-
src/opus_encoder.c | 2 +-
src/opus_multistream_encoder.c | 9 +-
win32/VS2010/celt.vcxproj | 17 +-
win32/VS2010/celt.vcxproj.filters | 27 +++
win32/VS2010/silk_common.vcxproj | 17 +-
win32/VS2010/silk_common.vcxproj.filters | 23 +-
win32/VS2010/silk_fixed.vcxproj | 13 +-
win32/VS2010/silk_fixed.vcxproj.filters | 17 +-
win32/config.h | 25 +-
63 files changed, 3623 insertions(+), 700 deletions(-)
create mode 100644 celt/arm/celt_ne10_fft.c
create mode 100644 celt/arm/celt_ne10_mdct.c
create mode 100644 celt/arm/fft_arm.h
create mode 100644 celt/arm/mdct_arm.h
create mode 100644 celt/dump_modes/dump_modes_arch.h
create mode 100644 celt/dump_modes/dump_modes_arm_ne10.c
create mode 100644 celt/static_modes_fixed_arm_ne10.h
create mode 100644 celt/static_modes_float_arm_ne10.h
create mode 100644 celt/x86/pitch_sse2.c
create mode 100644 celt/x86/pitch_sse4_1.c
create mode 100644 m4/opus-intrinsics.m4
--
1.9.1
More information about the opus
mailing list