[opus] Antw: Re: [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Tue Jun 6 07:03:18 UTC 2017
>>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017 um 06:46 in Nachricht
<CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at mail.gmail.com>:
> Hi Jean-Marc,
>
> I tried "==" before, and it failed when both results are 0.0. Maybe the
> exponent or sign has difference because of the different 0.0 representation
> in NEON. If anybody know how to handle this 0.0 comparison, that would be
> great.
> Or just use if(a==b || (a==0.0 && b==0.0)) ... but I haven't try this.
>From some faint memory of my math lessions I had produced code like this to get the smallest floating-point number different from zero:
double EPS; /* smallest number not equal to 0.0 */
/* refined estimate of EPS */
static double get_EPS(double eps)
{
while ( 1.0 + eps != 1.0 )
eps /= 2;
return(eps);
}
EPS = get_EPS(1.0);
On the x86_64 platform I get:
(gdb) p EPS
$1 = 1.1102230246251565e-16
Maybe it can help...
Regards,
Ulrich
>
> Thanks,
> Linfeng
>
> On Mon, Jun 5, 2017 at 8:43 PM Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
>
>> Hi Linfeng,
>>
>> On 05/06/17 03:31 PM, Linfeng Zhang wrote:
>> > Yes we'll have one more patch set related to xcorr in next week. Please
>> > don't wait if it's too late for 1.2 release.
>>
>> Assuming there's no issue with the patches, next week isn't too late.
>>
>> Also, I've started looking at your patches. So far there's one thing
>> that puzzles me a bit. In the OPUS_CHECK_ASM section of patch 0004, you
>> have:
>>
>> + celt_assert(ABS32(xy1_c - *xy1) <= VERY_SMALL);
>>
>> Given the normal range of the values (the xy values are often much
>> larger than one) and the precision involved here (24-bit mantissa), it
>> seems like this test can only succeed if the two values are actually
>> equal. Is the float patch actually bit-exact? If so, then maybe you
>> should be using actual equality. If not, then I guess we need to find
>> the right condition (which isn't obvious for floating point).
>>
>> Cheers,
>>
>> Jean-Marc
>>
>>
>> > Thanks,
>> > Linfeng
>> >
>> > On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at google.com
>> > <mailto:linfengz at google.com>> wrote:
>> >
>> > Hi Jean-Marc,
>> >
>> > I attached the new version in inner_prod_5patches_v2.zip which
>> > synced to the current master.
>> >
>> > For fixed-point ARM, only
>> > 0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch
>> > changes the performance.
>> > For floating-point ARM, only
>> > 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
>> > <http://elt_inner_prod-and-dual_inn.pa>tch changes the performance.
>> > Patch 1 and 2 are code clean-up and can only affect x86 performance.
>> > Patch 5 has neglectable effect on floating-point ARM performance.
>> >
>> > Thanks,
>> > Linfeng
>> >
>> > On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <jmvalin at jmvalin.ca
>> > <mailto:jmvalin at jmvalin.ca>> wrote:
>> >
>> > Hi Linfeng,
>> >
>> > I'll look into your patches. Can you let me know what's the
>> expected
>> > effect on performance (if any) for each of your patches? Also,
>> > are these
>> > all the patches you intend to merge for 1.2 or are there more
>> > upcoming ones?
>> >
>> > Cheers,
>> >
>> > Jean-Marc
>> >
>> > On 01/06/17 06:33 PM, Linfeng Zhang wrote:
>> > > Hi,
>> > >
>> > > Attached are 5 patches related to celt_inner_prod()
>> > > and dual_inner_prod() NEON intrinsics optimization.
>> > >
>> > > In
>> > 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
>> > <http://elt_inner_prod-and-dual_inn.pa>tch, the
>> > > optimization changed the order of floating-point inner
>> > products, which
>> > > will change the results. I
>> > > created celt_inner_prod_neon_float_c_simulation()
>> > > and dual_inner_prod_neon_float_c_simulation() to simulate the
>> > order
>> > > floating-point operations in NEON optimization and compare
>> their
>> > > results. Sorry that I cannot bond the distance between
>> original C
>> > > function and NEON function to any giving reasonable small
>> > number or
>> > > ratio. It's easy to create an input which 0 and 1,000 are both
>> > correct
>> > > results by just manipulating the inner product order.
>> > >
>> > > The total speed gain is about 1.0% for fixed-point encoder,
>> > and 1.8% for
>> > > floating-point encoder, in Complexity 8, tested on my
>> Chromebook.
>> > >
>> > > Thanks,
>> > > Linfeng
>> > >
>> > >
>> > > _______________________________________________
>> > > opus mailing list
>> > > opus at xiph.org <mailto:opus at xiph.org>
>> > > http://lists.xiph.org/mailman/listinfo/opus
>> > <http://lists.xiph.org/mailman/listinfo/opus>
>> > >
>> >
>> >
>> >
>>
More information about the opus
mailing list