[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics
Linfeng Zhang
linfengz at google.com
Mon Jun 5 19:54:57 UTC 2017
About 1% speed gain for fixed-point, and 1.5% for floating-point.
Thanks,
Linfeng
On Mon, Jun 5, 2017 at 12:49 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> On 05/06/17 03:28 PM, Linfeng Zhang wrote:
> > For fixed-point ARM, only
> > 0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch changes
> > the performance.
> > For floating-point ARM, only
> > 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch changes
> the performance.
>
> Got any numbers?
>
> Cheers,
>
> Jean-Marc
>
> > Patch 1 and 2 are code clean-up and can only affect x86 performance.
> > Patch 5 has neglectable effect on floating-point ARM performance.
> >
> > Thanks,
> > Linfeng
> >
> > On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <jmvalin at jmvalin.ca
> > <mailto:jmvalin at jmvalin.ca>> wrote:
> >
> > Hi Linfeng,
> >
> > I'll look into your patches. Can you let me know what's the expected
> > effect on performance (if any) for each of your patches? Also, are
> these
> > all the patches you intend to merge for 1.2 or are there more
> > upcoming ones?
> >
> > Cheers,
> >
> > Jean-Marc
> >
> > On 01/06/17 06:33 PM, Linfeng Zhang wrote:
> > > Hi,
> > >
> > > Attached are 5 patches related to celt_inner_prod()
> > > and dual_inner_prod() NEON intrinsics optimization.
> > >
> > > In
> > 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, the
> > > optimization changed the order of floating-point inner products,
> which
> > > will change the results. I
> > > created celt_inner_prod_neon_float_c_simulation()
> > > and dual_inner_prod_neon_float_c_simulation() to simulate the
> order
> > > floating-point operations in NEON optimization and compare their
> > > results. Sorry that I cannot bond the distance between original C
> > > function and NEON function to any giving reasonable small number or
> > > ratio. It's easy to create an input which 0 and 1,000 are both
> correct
> > > results by just manipulating the inner product order.
> > >
> > > The total speed gain is about 1.0% for fixed-point encoder, and
> > 1.8% for
> > > floating-point encoder, in Complexity 8, tested on my Chromebook.
> > >
> > > Thanks,
> > > Linfeng
> > >
> > >
> > > _______________________________________________
> > > opus mailing list
> > > opus at xiph.org <mailto:opus at xiph.org>
> > > http://lists.xiph.org/mailman/listinfo/opus
> > <http://lists.xiph.org/mailman/listinfo/opus>
> > >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170605/0670c26d/attachment.html>
More information about the opus
mailing list