[opus] [RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics
viswanath.puttagunta at linaro.org
Mon Nov 24 12:53:35 PST 2014
On 21 November 2014 at 18:06, Timothy B. Terriberry <tterribe at xiph.org> wrote:
> Viswanath Puttagunta wrote:
>> a. Simplest use case to validate this optimization for correctness.
>> b. Simplest use case to validate this optimization for performance.
>> Would prefer something like opusdec that can be executed on command
> The easiest thing to use is probably opus_demo (opusdec does a bunch of extra things, plus for interactive use we care about both the encoder and decoder, and celt_pitch_xcorr gets used vastly more by the encoder than the decoder... I think the decoder only uses it for PLC).
> Something like
> ./opus_demo restricted-lowdelay 48000 2 96000 comp48-stereo.sw /dev/null
> comp48-stereo.sw can be found here: https://people.xiph.org/~tterribe/opus/comp48-stereo.sw
> celt_pitch_xcorr also gets used by the SILK encoder (more in fixed-point than float, but the float one uses it, too). So it may be worth doing a run with the application set to voip instead of restricted-lowdelay and a lower bitrate (e.g., 24000 instead of 96000).
Thanks for your feedback. I have verified both above cases. While I used
./opus_demo restricted-lowdelay 48000 2 96000 comp48-stereo.sw out.wav,
./opus_demo voip 48000 2 24000 comp48-stereo.sw out.wav
to make sure the output out.wav is clearly audible, I used below
command (encode only) for performance benchmarking.
./opus_demo -e restricted-lowdelay 48000 2 96000 comp48-stereo.sw opus_raw.out
./opus_demo -e voip 48000 2 96000 comp48-stereo.sw opus_raw.out
I saw much better improvement in performance (16.16%) for overall
encode use case for "restricted-lowdelay 48000 2 96000" for CELT
encoding as you suspected as celt_pitch_xcorr function gets used much
I observed lesser improvement in performance (3.42%) for overall
encode use case for "voip 48000 2 24000". This is somewhat expected as
cel_pitch_xcorr_c was not the main contributor for performance in this
SILK encoder use case.
For detailed information on how I measured performance on my
Beaglebone Black (Cortex-A8), please see "celt_pitch_xcorr (float)
Neon Optimization" section of 
> Even though this primarily affects the encoder, as a sanity check, it's always good to make sure the test vectors still decode correctly. Get them from <http://opus-codec.org/testvectors/opus_testvectors.tar.gz> and use
> tests/run_vectors.sh <build path> <test vectors path> 48000
More information about the opus