[opus] [RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

Viswanath Puttagunta viswanath.puttagunta at linaro.org
Tue Nov 25 07:07:30 PST 2014

On 24 November 2014 at 17:48, Peter Robinson <pbrobinson at gmail.com> wrote:
>>> >> a. Simplest use case to validate this optimization for correctness.
>>> >> b. Simplest use case to validate this optimization for performance.
>>> >>
>>> >> Would prefer something like opusdec that can be executed on command
>>> >> line.
>>> >
>>> >
>>> > The easiest thing to use is probably opus_demo (opusdec does a bunch
of extra things, plus for interactive use we care about both the encoder
and decoder, and celt_pitch_xcorr gets used vastly more by the encoder than
the decoder... I think the decoder only uses it for PLC).
>>> >
>>> > Something like
>>> > ./opus_demo restricted-lowdelay 48000 2 96000 comp48-stereo.sw
>>> >
>>> > comp48-stereo.sw can be found here:
>>> >
>>> > celt_pitch_xcorr also gets used by the SILK encoder (more in
fixed-point than float, but the float one uses it, too). So it may be worth
doing a run with the application set to voip instead of restricted-lowdelay
and a lower bitrate (e.g., 24000 instead of 96000).
>>> Thanks for your feedback. I have verified both above cases. While I used
>>> ./opus_demo restricted-lowdelay 48000 2 96000 comp48-stereo.sw out.wav,
>>> ./opus_demo voip 48000 2 24000 comp48-stereo.sw out.wav
>>> to make sure the output out.wav is clearly audible, I used below
>>> command (encode only) for performance benchmarking.
>>> ./opus_demo -e restricted-lowdelay 48000 2 96000 comp48-stereo.sw
>>> ./opus_demo -e voip 48000 2 96000 comp48-stereo.sw opus_raw.out
>>> I saw much better improvement in performance  (16.16%) for overall
>>> encode use case for "restricted-lowdelay 48000 2 96000" for CELT
>>> encoding as you suspected as celt_pitch_xcorr function gets used much
>>> more.
>>> I observed lesser improvement in performance (3.42%) for overall
>>> encode use case for "voip 48000 2 24000". This is somewhat expected as
>>> cel_pitch_xcorr_c was not the main contributor for performance in this
>>> SILK encoder use case.
>>> For detailed information on how I measured performance on my
>>> Beaglebone Black (Cortex-A8), please see "celt_pitch_xcorr (float)
>>> Neon Optimization" section of [1]
>>> [1]:
>>> >
>>> > Even though this primarily affects the encoder, as a sanity check,
it's always good to make sure the test vectors still decode correctly. Get
them from <http://opus-codec.org/testvectors/opus_testvectors.tar.gz> and
>>> > tests/run_vectors.sh <build path> <test vectors path> 48000
>> OK, this took about 2 hours.. but all tests passed successfully.
>> Please let me know what the next steps are.
> Is there plans to support ARMv8/aarch64 NEON intrinsics too?
> Also is there plans to make the NEON optimisations on ARMv7 run time
> detectable like they have in cairo/pixman? For generic distributions
> it would nice to be able to be able to enable them as they offer
> decent performance improvements but have the code fall back on devices
> that don't support NEON.
Yep, adding support for ARMv8 is the final objective. I did not want to
introduce too many changes in the first shot... and hence only introduced
for ARMv7. In theory, most of the code (neon intrinsic code) in this patch
should remain unchanged for ARMv8. Only the mechanism by which neon/asimd
presence is detected during runtime and the flags used during compile are
the only ones that should change. I will work on this once this patch gets
reviewed and accepted. I made sure these changes are fairly localized.

And yes, this patch also supports runtime detection of neon. Actually, most
of code to do run time detection of neon was already there in the project
before this patch. I just re-used the infrastructure.

> Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/opus/attachments/20141125/55fc1b93/attachment-0001.htm 

More information about the opus mailing list