[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

Mon Jun 5 19:54:57 UTC 2017

About 1% speed gain for fixed-point, and 1.5% for floating-point.

Thanks,
Linfeng

On Mon, Jun 5, 2017 at 12:49 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:

> On 05/06/17 03:28 PM, Linfeng Zhang wrote:
> > For fixed-point ARM, only
> > 0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch changes
> > the performance.
> > For floating-point ARM, only
> > 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch changes
> the performance.
>
> Got any numbers?
>
> Cheers,
>
>         Jean-Marc
>
> > Patch 1 and 2 are code clean-up and can only affect x86 performance.
> > Patch 5 has neglectable effect on floating-point ARM performance.
> >
> > Thanks,
> > Linfeng
> >
> > On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <jmvalin at jmvalin.ca
> > <mailto:jmvalin at jmvalin.ca>> wrote:
> >
> >     Hi Linfeng,
> >
> >     I'll look into your patches. Can you let me know what's the expected
> >     effect on performance (if any) for each of your patches? Also, are
> these
> >     all the patches you intend to merge for 1.2 or are there more
> >     upcoming ones?
> >
> >     Cheers,
> >
> >             Jean-Marc
> >
> >     On 01/06/17 06:33 PM, Linfeng Zhang wrote:
> >     > Hi,
> >     >
> >     > Attached are 5 patches related to celt_inner_prod()
> >     > and dual_inner_prod() NEON intrinsics optimization.
> >     >
> >     > In
> >     0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, the
> >     > optimization changed the order of floating-point inner products,
> which
> >     > will change the results. I
> >     > created celt_inner_prod_neon_float_c_simulation()
> >     > and dual_inner_prod_neon_float_c_simulation() to simulate the
> order
> >     > floating-point operations in NEON optimization and compare their
> >     > results. Sorry that I cannot bond the distance between original C
> >     > function and NEON function to any giving reasonable small number or
> >     > ratio. It's easy to create an input which 0 and 1,000 are both
> correct
> >     > results by just manipulating the inner product order.
> >     >
> >     > The total speed gain is about 1.0% for fixed-point encoder, and
> >     1.8% for
> >     > floating-point encoder, in Complexity 8, tested on my Chromebook.
> >     >
> >     > Thanks,
> >     > Linfeng
> >     >
> >     >
> >     > _______________________________________________
> >     > opus mailing list
> >     > opus at xiph.org <mailto:opus at xiph.org>
> >     > http://lists.xiph.org/mailman/listinfo/opus
> >     <http://lists.xiph.org/mailman/listinfo/opus>
> >     >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170605/0670c26d/attachment.html>