[opus] Antw: Re: [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

Tue Jun 6 07:03:18 UTC 2017

>>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017 um 06:46 in Nachricht
<CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at mail.gmail.com>:
> Hi Jean-Marc,
> 
> I tried "==" before, and it failed when both results are 0.0. Maybe the
> exponent or sign has difference because of the different 0.0 representation
> in NEON. If anybody know how to handle this 0.0 comparison, that would be
> great.
> Or just use if(a==b || (a==0.0 && b==0.0)) ... but I haven't try this.

>From some faint memory of my math lessions I had produced code like this to get the smallest floating-point number different from zero:

double  EPS;            /* smallest number not equal to 0.0 */

/* refined estimate of EPS */
static  double  get_EPS(double eps)
{

        while ( 1.0 + eps != 1.0 )
                eps /= 2;
        return(eps);
}

EPS = get_EPS(1.0);

On the x86_64 platform I get:
(gdb) p EPS
$1 = 1.1102230246251565e-16

Maybe it can help...

Regards,
Ulrich

> 
> Thanks,
> Linfeng
> 
> On Mon, Jun 5, 2017 at 8:43 PM Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> 
>> Hi Linfeng,
>>
>> On 05/06/17 03:31 PM, Linfeng Zhang wrote:
>> > Yes we'll have one more patch set related to xcorr in next week. Please
>> > don't wait if it's too late for 1.2 release.
>>
>> Assuming there's no issue with the patches, next week isn't too late.
>>
>> Also, I've started looking at your patches. So far there's one thing
>> that puzzles me a bit. In the OPUS_CHECK_ASM section of patch 0004, you
>> have:
>>
>> +        celt_assert(ABS32(xy1_c - *xy1) <= VERY_SMALL);
>>
>> Given the normal range of the values (the xy values are often much
>> larger than one) and the precision involved here (24-bit mantissa), it
>> seems like this test can only succeed if the two values are actually
>> equal. Is the float patch actually bit-exact? If so, then maybe you
>> should be using actual equality. If not, then I guess we need to find
>> the right condition (which isn't obvious for floating point).
>>
>> Cheers,
>>
>>         Jean-Marc
>>
>>
>> > Thanks,
>> > Linfeng
>> >
>> > On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at google.com 
>> > <mailto:linfengz at google.com>> wrote:
>> >
>> >     Hi Jean-Marc,
>> >
>> >     I attached the new version in inner_prod_5patches_v2.zip which
>> >     synced to the current master.
>> >
>> >     For fixed-point ARM, only
>> >     0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch
>> >     changes the performance.
>> >     For floating-point ARM, only
>> >     0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
>> >     <http://elt_inner_prod-and-dual_inn.pa>tch changes the performance.
>> >     Patch 1 and 2 are code clean-up and can only affect x86 performance.
>> >     Patch 5 has neglectable effect on floating-point ARM performance.
>> >
>> >     Thanks,
>> >     Linfeng
>> >
>> >     On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <jmvalin at jmvalin.ca 
>> >     <mailto:jmvalin at jmvalin.ca>> wrote:
>> >
>> >         Hi Linfeng,
>> >
>> >         I'll look into your patches. Can you let me know what's the
>> expected
>> >         effect on performance (if any) for each of your patches? Also,
>> >         are these
>> >         all the patches you intend to merge for 1.2 or are there more
>> >         upcoming ones?
>> >
>> >         Cheers,
>> >
>> >                 Jean-Marc
>> >
>> >         On 01/06/17 06:33 PM, Linfeng Zhang wrote:
>> >         > Hi,
>> >         >
>> >         > Attached are 5 patches related to celt_inner_prod()
>> >         > and dual_inner_prod() NEON intrinsics optimization.
>> >         >
>> >         > In
>> >         0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
>> >         <http://elt_inner_prod-and-dual_inn.pa>tch, the
>> >         > optimization changed the order of floating-point inner
>> >         products, which
>> >         > will change the results. I
>> >         > created celt_inner_prod_neon_float_c_simulation()
>> >         > and dual_inner_prod_neon_float_c_simulation() to simulate the
>> >         order
>> >         > floating-point operations in NEON optimization and compare
>> their
>> >         > results. Sorry that I cannot bond the distance between
>> original C
>> >         > function and NEON function to any giving reasonable small
>> >         number or
>> >         > ratio. It's easy to create an input which 0 and 1,000 are both
>> >         correct
>> >         > results by just manipulating the inner product order.
>> >         >
>> >         > The total speed gain is about 1.0% for fixed-point encoder,
>> >         and 1.8% for
>> >         > floating-point encoder, in Complexity 8, tested on my
>> Chromebook.
>> >         >
>> >         > Thanks,
>> >         > Linfeng
>> >         >
>> >         >
>> >         > _______________________________________________
>> >         > opus mailing list
>> >         > opus at xiph.org <mailto:opus at xiph.org>
>> >         > http://lists.xiph.org/mailman/listinfo/opus 
>> >         <http://lists.xiph.org/mailman/listinfo/opus>
>> >         >
>> >
>> >
>> >
>>