[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

Tue Jun 6 05:52:36 UTC 2017

As far as I know, +0 should be equal to -0 in C. And even then, I don't
see a reason two identical pieces of code should give different results
on an IEEE 754-compliant platform (which I believe Neon is). Can you
check what exactly is the case that doesn't match?

Cheers,

	Jean-Marc

On 06/06/17 12:46 AM, Linfeng Zhang wrote:
> Hi Jean-Marc,
> 
> I tried "==" before, and it failed when both results are 0.0. Maybe the
> exponent or sign has difference because of the different 0.0
> representation in NEON. If anybody know how to handle this 0.0
> comparison, that would be great.
> Or just use if(a==b || (a==0.0 && b==0.0)) ... but I haven't try this.
> 
> Thanks,
> Linfeng
> 
> On Mon, Jun 5, 2017 at 8:43 PM Jean-Marc Valin <jmvalin at jmvalin.ca
> <mailto:jmvalin at jmvalin.ca>> wrote:
> 
>     Hi Linfeng,
> 
>     On 05/06/17 03:31 PM, Linfeng Zhang wrote:
>     > Yes we'll have one more patch set related to xcorr in next week.
>     Please
>     > don't wait if it's too late for 1.2 release.
> 
>     Assuming there's no issue with the patches, next week isn't too late.
> 
>     Also, I've started looking at your patches. So far there's one thing
>     that puzzles me a bit. In the OPUS_CHECK_ASM section of patch 0004, you
>     have:
> 
>     +        celt_assert(ABS32(xy1_c - *xy1) <= VERY_SMALL);
> 
>     Given the normal range of the values (the xy values are often much
>     larger than one) and the precision involved here (24-bit mantissa), it
>     seems like this test can only succeed if the two values are actually
>     equal. Is the float patch actually bit-exact? If so, then maybe you
>     should be using actual equality. If not, then I guess we need to find
>     the right condition (which isn't obvious for floating point).
> 
>     Cheers,
> 
>             Jean-Marc
> 
> 
>     > Thanks,
>     > Linfeng
>     >
>     > On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang
>     <linfengz at google.com <mailto:linfengz at google.com>
>     > <mailto:linfengz at google.com <mailto:linfengz at google.com>>> wrote:
>     >
>     >     Hi Jean-Marc,
>     >
>     >     I attached the new version in inner_prod_5patches_v2.zip which
>     >     synced to the current master.
>     >
>     >     For fixed-point ARM, only
>     >     0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch
>     >     changes the performance.
>     >     For floating-point ARM, only
>     >     0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
>     >     <http://elt_inner_prod-and-dual_inn.pa>tch changes the
>     performance.
>     >     Patch 1 and 2 are code clean-up and can only affect x86
>     performance.
>     >     Patch 5 has neglectable effect on floating-point ARM performance.
>     >
>     >     Thanks,
>     >     Linfeng
>     >
>     >     On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin
>     <jmvalin at jmvalin.ca <mailto:jmvalin at jmvalin.ca>
>     >     <mailto:jmvalin at jmvalin.ca <mailto:jmvalin at jmvalin.ca>>> wrote:
>     >
>     >         Hi Linfeng,
>     >
>     >         I'll look into your patches. Can you let me know what's
>     the expected
>     >         effect on performance (if any) for each of your patches? Also,
>     >         are these
>     >         all the patches you intend to merge for 1.2 or are there more
>     >         upcoming ones?
>     >
>     >         Cheers,
>     >
>     >                 Jean-Marc
>     >
>     >         On 01/06/17 06:33 PM, Linfeng Zhang wrote:
>     >         > Hi,
>     >         >
>     >         > Attached are 5 patches related to celt_inner_prod()
>     >         > and dual_inner_prod() NEON intrinsics optimization.
>     >         >
>     >         > In
>     >         0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
>     >         <http://elt_inner_prod-and-dual_inn.pa>tch, the
>     >         > optimization changed the order of floating-point inner
>     >         products, which
>     >         > will change the results. I
>     >         > created celt_inner_prod_neon_float_c_simulation()
>     >         > and dual_inner_prod_neon_float_c_simulation() to
>     simulate the
>     >         order
>     >         > floating-point operations in NEON optimization and
>     compare their
>     >         > results. Sorry that I cannot bond the distance between
>     original C
>     >         > function and NEON function to any giving reasonable small
>     >         number or
>     >         > ratio. It's easy to create an input which 0 and 1,000
>     are both
>     >         correct
>     >         > results by just manipulating the inner product order.
>     >         >
>     >         > The total speed gain is about 1.0% for fixed-point encoder,
>     >         and 1.8% for
>     >         > floating-point encoder, in Complexity 8, tested on my
>     Chromebook.
>     >         >
>     >         > Thanks,
>     >         > Linfeng
>     >         >
>     >         >
>     >         > _______________________________________________
>     >         > opus mailing list
>     >         > opus at xiph.org <mailto:opus at xiph.org>
>     <mailto:opus at xiph.org <mailto:opus at xiph.org>>
>     >         > http://lists.xiph.org/mailman/listinfo/opus
>     >         <http://lists.xiph.org/mailman/listinfo/opus>
>     >         >
>     >
>     >
>     >
>