[opus] [RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

Viswanath Puttagunta viswanath.puttagunta at linaro.org
Mon Nov 24 15:37:38 PST 2014


On 24 November 2014 at 14:53, Viswanath Puttagunta
<viswanath.puttagunta at linaro.org> wrote:
>
> On 21 November 2014 at 18:06, Timothy B. Terriberry <tterribe at xiph.org> wrote:
> >
> > Viswanath Puttagunta wrote:
> >>
> >> a. Simplest use case to validate this optimization for correctness.
> >> b. Simplest use case to validate this optimization for performance.
> >>
> >> Would prefer something like opusdec that can be executed on command
> >> line.
> >
> >
> > The easiest thing to use is probably opus_demo (opusdec does a bunch of extra things, plus for interactive use we care about both the encoder and decoder, and celt_pitch_xcorr gets used vastly more by the encoder than the decoder... I think the decoder only uses it for PLC).
> >
> > Something like
> > ./opus_demo restricted-lowdelay 48000 2 96000 comp48-stereo.sw /dev/null
> >
> > comp48-stereo.sw can be found here: https://people.xiph.org/~tterribe/opus/comp48-stereo.sw
> >
> > celt_pitch_xcorr also gets used by the SILK encoder (more in fixed-point than float, but the float one uses it, too). So it may be worth doing a run with the application set to voip instead of restricted-lowdelay and a lower bitrate (e.g., 24000 instead of 96000).
>
> Thanks for your feedback. I have verified both above cases. While I used
> ./opus_demo restricted-lowdelay 48000 2 96000 comp48-stereo.sw out.wav,
> ./opus_demo voip 48000 2 24000 comp48-stereo.sw out.wav
>
> to make sure the output out.wav is clearly audible, I used below
> command (encode only) for performance benchmarking.
>
> ./opus_demo -e restricted-lowdelay 48000 2 96000 comp48-stereo.sw opus_raw.out
> ./opus_demo -e voip 48000 2 96000 comp48-stereo.sw opus_raw.out
>
> I saw much better improvement in performance  (16.16%) for overall
> encode use case for "restricted-lowdelay 48000 2 96000" for CELT
> encoding as you suspected as celt_pitch_xcorr function gets used much
> more.
>
> I observed lesser improvement in performance (3.42%) for overall
> encode use case for "voip 48000 2 24000". This is somewhat expected as
> cel_pitch_xcorr_c was not the main contributor for performance in this
> SILK encoder use case.
>
> For detailed information on how I measured performance on my
> Beaglebone Black (Cortex-A8), please see "celt_pitch_xcorr (float)
> Neon Optimization" section of [1]
>
> [1]: https://docs.google.com/document/d/1L6csATjSsXtzg_sa1iHZta8hOsoVWA4UjHXEakpTrNk/edit?usp=sharing
>
>
>
> >
> > Even though this primarily affects the encoder, as a sanity check, it's always good to make sure the test vectors still decode correctly. Get them from <http://opus-codec.org/testvectors/opus_testvectors.tar.gz> and use
> > tests/run_vectors.sh <build path> <test vectors path> 48000

OK, this took about 2 hours.. but all tests passed successfully.
Please let me know what the next steps are.


More information about the opus mailing list