[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics
Jean-Marc Valin
jmvalin at jmvalin.ca
Tue Jun 6 05:52:36 UTC 2017
As far as I know, +0 should be equal to -0 in C. And even then, I don't
see a reason two identical pieces of code should give different results
on an IEEE 754-compliant platform (which I believe Neon is). Can you
check what exactly is the case that doesn't match?
Cheers,
Jean-Marc
On 06/06/17 12:46 AM, Linfeng Zhang wrote:
> Hi Jean-Marc,
>
> I tried "==" before, and it failed when both results are 0.0. Maybe the
> exponent or sign has difference because of the different 0.0
> representation in NEON. If anybody know how to handle this 0.0
> comparison, that would be great.
> Or just use if(a==b || (a==0.0 && b==0.0)) ... but I haven't try this.
>
> Thanks,
> Linfeng
>
> On Mon, Jun 5, 2017 at 8:43 PM Jean-Marc Valin <jmvalin at jmvalin.ca
> <mailto:jmvalin at jmvalin.ca>> wrote:
>
> Hi Linfeng,
>
> On 05/06/17 03:31 PM, Linfeng Zhang wrote:
> > Yes we'll have one more patch set related to xcorr in next week.
> Please
> > don't wait if it's too late for 1.2 release.
>
> Assuming there's no issue with the patches, next week isn't too late.
>
> Also, I've started looking at your patches. So far there's one thing
> that puzzles me a bit. In the OPUS_CHECK_ASM section of patch 0004, you
> have:
>
> + celt_assert(ABS32(xy1_c - *xy1) <= VERY_SMALL);
>
> Given the normal range of the values (the xy values are often much
> larger than one) and the precision involved here (24-bit mantissa), it
> seems like this test can only succeed if the two values are actually
> equal. Is the float patch actually bit-exact? If so, then maybe you
> should be using actual equality. If not, then I guess we need to find
> the right condition (which isn't obvious for floating point).
>
> Cheers,
>
> Jean-Marc
>
>
> > Thanks,
> > Linfeng
> >
> > On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang
> <linfengz at google.com <mailto:linfengz at google.com>
> > <mailto:linfengz at google.com <mailto:linfengz at google.com>>> wrote:
> >
> > Hi Jean-Marc,
> >
> > I attached the new version in inner_prod_5patches_v2.zip which
> > synced to the current master.
> >
> > For fixed-point ARM, only
> > 0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch
> > changes the performance.
> > For floating-point ARM, only
> > 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
> > <http://elt_inner_prod-and-dual_inn.pa>tch changes the
> performance.
> > Patch 1 and 2 are code clean-up and can only affect x86
> performance.
> > Patch 5 has neglectable effect on floating-point ARM performance.
> >
> > Thanks,
> > Linfeng
> >
> > On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin
> <jmvalin at jmvalin.ca <mailto:jmvalin at jmvalin.ca>
> > <mailto:jmvalin at jmvalin.ca <mailto:jmvalin at jmvalin.ca>>> wrote:
> >
> > Hi Linfeng,
> >
> > I'll look into your patches. Can you let me know what's
> the expected
> > effect on performance (if any) for each of your patches? Also,
> > are these
> > all the patches you intend to merge for 1.2 or are there more
> > upcoming ones?
> >
> > Cheers,
> >
> > Jean-Marc
> >
> > On 01/06/17 06:33 PM, Linfeng Zhang wrote:
> > > Hi,
> > >
> > > Attached are 5 patches related to celt_inner_prod()
> > > and dual_inner_prod() NEON intrinsics optimization.
> > >
> > > In
> > 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
> > <http://elt_inner_prod-and-dual_inn.pa>tch, the
> > > optimization changed the order of floating-point inner
> > products, which
> > > will change the results. I
> > > created celt_inner_prod_neon_float_c_simulation()
> > > and dual_inner_prod_neon_float_c_simulation() to
> simulate the
> > order
> > > floating-point operations in NEON optimization and
> compare their
> > > results. Sorry that I cannot bond the distance between
> original C
> > > function and NEON function to any giving reasonable small
> > number or
> > > ratio. It's easy to create an input which 0 and 1,000
> are both
> > correct
> > > results by just manipulating the inner product order.
> > >
> > > The total speed gain is about 1.0% for fixed-point encoder,
> > and 1.8% for
> > > floating-point encoder, in Complexity 8, tested on my
> Chromebook.
> > >
> > > Thanks,
> > > Linfeng
> > >
> > >
> > > _______________________________________________
> > > opus mailing list
> > > opus at xiph.org <mailto:opus at xiph.org>
> <mailto:opus at xiph.org <mailto:opus at xiph.org>>
> > > http://lists.xiph.org/mailman/listinfo/opus
> > <http://lists.xiph.org/mailman/listinfo/opus>
> > >
> >
> >
> >
>
More information about the opus
mailing list