[opus] [RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics
pbrobinson at gmail.com
Mon Nov 24 15:48:05 PST 2014
>> >> a. Simplest use case to validate this optimization for correctness.
>> >> b. Simplest use case to validate this optimization for performance.
>> >> Would prefer something like opusdec that can be executed on command
>> >> line.
>> > The easiest thing to use is probably opus_demo (opusdec does a bunch of extra things, plus for interactive use we care about both the encoder and decoder, and celt_pitch_xcorr gets used vastly more by the encoder than the decoder... I think the decoder only uses it for PLC).
>> > Something like
>> > ./opus_demo restricted-lowdelay 48000 2 96000 comp48-stereo.sw /dev/null
>> > comp48-stereo.sw can be found here: https://people.xiph.org/~tterribe/opus/comp48-stereo.sw
>> > celt_pitch_xcorr also gets used by the SILK encoder (more in fixed-point than float, but the float one uses it, too). So it may be worth doing a run with the application set to voip instead of restricted-lowdelay and a lower bitrate (e.g., 24000 instead of 96000).
>> Thanks for your feedback. I have verified both above cases. While I used
>> ./opus_demo restricted-lowdelay 48000 2 96000 comp48-stereo.sw out.wav,
>> ./opus_demo voip 48000 2 24000 comp48-stereo.sw out.wav
>> to make sure the output out.wav is clearly audible, I used below
>> command (encode only) for performance benchmarking.
>> ./opus_demo -e restricted-lowdelay 48000 2 96000 comp48-stereo.sw opus_raw.out
>> ./opus_demo -e voip 48000 2 96000 comp48-stereo.sw opus_raw.out
>> I saw much better improvement in performance (16.16%) for overall
>> encode use case for "restricted-lowdelay 48000 2 96000" for CELT
>> encoding as you suspected as celt_pitch_xcorr function gets used much
>> I observed lesser improvement in performance (3.42%) for overall
>> encode use case for "voip 48000 2 24000". This is somewhat expected as
>> cel_pitch_xcorr_c was not the main contributor for performance in this
>> SILK encoder use case.
>> For detailed information on how I measured performance on my
>> Beaglebone Black (Cortex-A8), please see "celt_pitch_xcorr (float)
>> Neon Optimization" section of 
>> : https://docs.google.com/document/d/1L6csATjSsXtzg_sa1iHZta8hOsoVWA4UjHXEakpTrNk/edit?usp=sharing
>> > Even though this primarily affects the encoder, as a sanity check, it's always good to make sure the test vectors still decode correctly. Get them from <http://opus-codec.org/testvectors/opus_testvectors.tar.gz> and use
>> > tests/run_vectors.sh <build path> <test vectors path> 48000
> OK, this took about 2 hours.. but all tests passed successfully.
> Please let me know what the next steps are.
Is there plans to support ARMv8/aarch64 NEON intrinsics too?
Also is there plans to make the NEON optimisations on ARMv7 run time
detectable like they have in cairo/pixman? For generic distributions
it would nice to be able to be able to enable them as they offer
decent performance improvements but have the code fall back on devices
that don't support NEON.
More information about the opus