[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

Mon Jun 5 19:31:41 UTC 2017

Yes we'll have one more patch set related to xcorr in next week. Please
don't wait if it's too late for 1.2 release.

Thanks,
Linfeng

On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at google.com> wrote:

> Hi Jean-Marc,
>
> I attached the new version in inner_prod_5patches_v2.zip which synced to
> the current master.
>
> For fixed-point ARM, only 0003-Optimize-fixed-point-celt
> _inner_prod-and-dual_inner_.patch changes the performance.
> For floating-point ARM, only 0004-Optimize-floating-point-c
> elt_inner_prod-and-dual_inn.patch changes the performance.
> Patch 1 and 2 are code clean-up and can only affect x86 performance.
> Patch 5 has neglectable effect on floating-point ARM performance.
>
> Thanks,
> Linfeng
>
> On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <jmvalin at jmvalin.ca>
> wrote:
>
>> Hi Linfeng,
>>
>> I'll look into your patches. Can you let me know what's the expected
>> effect on performance (if any) for each of your patches? Also, are these
>> all the patches you intend to merge for 1.2 or are there more upcoming
>> ones?
>>
>> Cheers,
>>
>>         Jean-Marc
>>
>> On 01/06/17 06:33 PM, Linfeng Zhang wrote:
>> > Hi,
>> >
>> > Attached are 5 patches related to celt_inner_prod()
>> > and dual_inner_prod() NEON intrinsics optimization.
>> >
>> > In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, the
>> > optimization changed the order of floating-point inner products, which
>> > will change the results. I
>> > created celt_inner_prod_neon_float_c_simulation()
>> > and dual_inner_prod_neon_float_c_simulation() to simulate the order
>> > floating-point operations in NEON optimization and compare their
>> > results. Sorry that I cannot bond the distance between original C
>> > function and NEON function to any giving reasonable small number or
>> > ratio. It's easy to create an input which 0 and 1,000 are both correct
>> > results by just manipulating the inner product order.
>> >
>> > The total speed gain is about 1.0% for fixed-point encoder, and 1.8% for
>> > floating-point encoder, in Complexity 8, tested on my Chromebook.
>> >
>> > Thanks,
>> > Linfeng
>> >
>> >
>> > _______________________________________________
>> > opus mailing list
>> > opus at xiph.org
>> > http://lists.xiph.org/mailman/listinfo/opus
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170605/80e79e73/attachment.html>