<div dir="ltr">Thank Ulrich!<div><br></div><div>Yes, using</div><div><div><font face="monospace, monospace">        celt_assert(1.0 + celt_inner_prod_neon_float_c_s<wbr>imulation(x, y, N) == 1.0 + xy);</font></div></div><div><div><font face="monospace, monospace">        celt_assert(1.0 + xy1_c == 1.0 + *xy1);</font></div><div><font face="monospace, monospace">        celt_assert(1.0 + xy2_c == 1.0 + *xy2);</font></div></div><div>can avoid the useage of VERY_SMALL.</div><div><br></div><div>Hi Jean-Marc,</div><div><br></div><div>I added</div><div><div><font face="monospace, monospace">    {</font></div><div><font face="monospace, monospace">        const opus_val32 xy_c = celt_inner_prod_neon_float_c_s<wbr>imulation(x, y, N);</font></div><div><font face="monospace, monospace">        const int32_t *x_bin = (int32_t*)x;</font></div><div><font face="monospace, monospace">        const int32_t *y_bin = (int32_t*)y;</font></div><div><font face="monospace, monospace">        const int32_t *xy_bin = (int32_t*)&xy;</font></div><div><font face="monospace, monospace">        const int32_t *xy_bin_c = (int32_t*)&xy_c;</font></div><div><font face="monospace, monospace">        // if((xy_c != xy) && (xy_c != 0.0) && (xy != 0.0)) {</font></div><div><font face="monospace, monospace">        if(xy_c != xy) {</font></div><div><font face="monospace, monospace">            printf("\n xy_c = %9f, xy   = %9f", xy_c, xy);</font></div><div><font face="monospace, monospace">            printf(" | xy_c = %13e, xy   = %13e", xy_c, xy);</font></div><div><font face="monospace, monospace">            printf(" | xy_c (bin) = 0x%8x, xy   (bin) = 0x%8x\n", *xy_bin_c, *xy_bin);</font></div><div><font face="monospace, monospace">            printf("\n N = %d", N);</font></div><div><font face="monospace, monospace">            for (i = 0; i < N; i++) {</font></div><div><font face="monospace, monospace">              printf("\n x[%d] = %9f, y[%d] = %9f", i, x[i], i, y[i]);</font></div><div><font face="monospace, monospace">              printf(" | x[%d] = %13e, y[%d] = %13e", i, x[i], i, y[i]);</font></div><div><font face="monospace, monospace">              printf(" | x[%d] (bin) = 0x%8x, y[%d] (bin) = 0x%8x", i, x_bin[i], i, y_bin[i]);</font></div><div><font face="monospace, monospace">            }</font></div><div><font face="monospace, monospace">            printf("\n\n");</font></div><div><font face="monospace, monospace">        }</font></div><div><font face="monospace, monospace">    }</font></div></div><div><br></div><div>And got the following two cases when testing speech_mono_32_48kHz.pcm (Download: <a href="https://drive.google.com/file/d/0B2bjttuYjfVYaHBDZE1XV3B0MHM" target="_blank">https://drive.google.com/file/<wbr>d/0B2bjttuYjfVYaHBDZE1XV3B0MHM</a><wbr>) on NEON:</div><div><br></div><div>$ ./opus_demo -e voip  48000 1 32000 -complexity  8 speech_mono_32_48kHz.pcm  tmp.opus</div><div><br></div><div><div><font face="monospace, monospace">libopus 1.2-beta-27-g6c51a195-dirty</font></div><div><font face="monospace, monospace">Encoding 48000 Hz input at 32.000 kb/s in auto bandwidth with 960-sample frames.</font></div><div><font face="monospace, monospace"><br></font></div></div><div><div><font face="monospace, monospace"> xy_c =  0.000000, xy   =  0.000000 | xy_c =  5.605194e-45, xy   =  0.000000e+00 | xy_c (bin) = 0x       4, xy   (bin) = 0x       0</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> N = 8</font></div><div><font face="monospace, monospace"> x[0] = -0.000000, y[0] = -0.000000 | x[0] = -7.783648e-23, y[0] = -7.783648e-23 | x[0] (bin) = 0x9abc3273, y[0] (bin) = 0x9abc3273</font></div><div><font face="monospace, monospace"> x[1] = -0.000000, y[1] = -0.000000 | x[1] = -1.862279e-23, y[1] = -1.862279e-23 | x[1] (bin) = 0x99b41bca, y[1] (bin) = 0x99b41bca</font></div><div><font face="monospace, monospace"> x[2] =  0.000000, y[2] =  0.000000 | x[2] =  1.092297e-23, y[2] =  1.092297e-23 | x[2] (bin) = 0x195347ee, y[2] (bin) = 0x195347ee</font></div><div><font face="monospace, monospace"> x[3] = -0.000000, y[3] = -0.000000 | x[3] = -5.171255e-25, y[3] = -5.171255e-25 | x[3] (bin) = 0x97200ae8, y[3] (bin) = 0x97200ae8</font></div><div><font face="monospace, monospace"> x[4] = -0.000000, y[4] = -0.000000 | x[4] = -4.773915e-24, y[4] = -4.773915e-24 | x[4] (bin) = 0x98b8ae90, y[4] (bin) = 0x98b8ae90</font></div><div><font face="monospace, monospace"> x[5] = -0.000000, y[5] = -0.000000 | x[5] = -3.717311e-25, y[5] = -3.717311e-25 | x[5] (bin) = 0x96e61724, y[5] (bin) = 0x96e61724</font></div><div><font face="monospace, monospace"> x[6] = -0.000000, y[6] = -0.000000 | x[6] = -1.724025e-24, y[6] = -1.724025e-24 | x[6] (bin) = 0x980563d5, y[6] (bin) = 0x980563d5</font></div><div><font face="monospace, monospace"> x[7] = -0.000000, y[7] = -0.000000 | x[7] = -2.245937e-24, y[7] = -2.245937e-24 | x[7] (bin) = 0x982dc55f, y[7] (bin) = 0x982dc55f</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">==============================<wbr>==============================<wbr>==</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> xy_c =  0.000000, xy   =  0.000000 | xy_c =  1.121039e-44, xy   =  0.000000e+00 | xy_c (bin) = 0x       8, xy   (bin) = 0x       0</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> N = 8</font></div><div><font face="monospace, monospace"> x[0] = -0.000000, y[0] = -0.000000 | x[0] = -1.000134e-22, y[0] = -1.000134e-22 | x[0] (bin) = 0x9af1d148, y[0] (bin) = 0x9af1d148</font></div><div><font face="monospace, monospace"> x[1] =  0.000000, y[1] =  0.000000 | x[1] =  3.052170e-23, y[1] =  3.052170e-23 | x[1] (bin) = 0x1a139809, y[1] (bin) = 0x1a139809</font></div><div><font face="monospace, monospace"> x[2] = -0.000000, y[2] = -0.000000 | x[2] = -2.135591e-23, y[2] = -2.135591e-23 | x[2] (bin) = 0x99ce8aaf, y[2] (bin) = 0x99ce8aaf</font></div><div><font face="monospace, monospace"> x[3] =  0.000000, y[3] =  0.000000 | x[3] =  1.180839e-23, y[3] =  1.180839e-23 | x[3] (bin) = 0x19646856, y[3] (bin) = 0x19646856</font></div><div><font face="monospace, monospace"> x[4] = -0.000000, y[4] = -0.000000 | x[4] = -1.230446e-23, y[4] = -1.230446e-23 | x[4] (bin) = 0x996e00bc, y[4] (bin) = 0x996e00bc</font></div><div><font face="monospace, monospace"> x[5] =  0.000000, y[5] =  0.000000 | x[5] =  6.443248e-24, y[5] =  6.443248e-24 | x[5] (bin) = 0x18f942d6, y[5] (bin) = 0x18f942d6</font></div><div><font face="monospace, monospace"> x[6] = -0.000000, y[6] = -0.000000 | x[6] = -8.497414e-24, y[6] = -8.497414e-24 | x[6] (bin) = 0x99245d28, y[6] (bin) = 0x99245d28</font></div><div><font face="monospace, monospace"> x[7] =  0.000000, y[7] =  0.000000 | x[7] =  3.849347e-24, y[7] =  3.849347e-24 | x[7] (bin) = 0x1894ea17, y[7] (bin) = 0x1894ea17</font></div></div><div><br></div><div>There are 3 possible reasons.</div><div>1. Of course celt_inner_prod_neon_fl<wbr>oat_c_simulation() may have bug. Please help me find if any.</div><div>2. Though impossible, it's possible NEON is not IEEE <span style="font-size:12.8px">754-compliant </span>when dealing with near 0 floating-point values.</div><div>3. Though more impossible, it's possible gcc is not IEEE <span style="font-size:12.8px">754-compliant here. :)</span></div><div><br></div><div>Since x[i] == y[i] in both cases, they are actually calculating the energy.</div><div><font face="arial, helvetica, sans-serif">(-1.000134e-22 * -1.000134e-22) is larger than the smallest single-precision number and</font><span style="font-family:arial,helvetica,sans-serif"> should be represented as none-zero (such as 0x8). I don't know why NEON gives 0 result.</span></div><div><br></div><div>Thanks,</div><div>Linfeng</div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jun 6, 2017 at 12:03 AM, Ulrich Windl <span dir="ltr"><<a href="mailto:Ulrich.Windl@rz.uni-regensburg.de" target="_blank">Ulrich.Windl@rz.uni-regensbur<wbr>g.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">>>> Linfeng Zhang <<a href="mailto:linfengz@google.com" target="_blank">linfengz@google.com</a>> schrieb am 06.06.2017 um 06:46 in Nachricht<br>

<<a href="mailto:CAKoqLCAfj%2BfDUMLfN4dLNSZ4NNAZpaSt_BWZRp%2B7XBqfhiSqiQ@mail.gmail.com" target="_blank">CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZ<wbr>paSt_BWZRp+7XBqfhiSqiQ@mail.gm<wbr>ail.com</a>>:<br>

<span class="gmail-m_-9128358907921410918gmail-m_-3882600981437509126gmail-">> Hi Jean-Marc,<br>

><br>

> I tried "==" before, and it failed when both results are 0.0. Maybe the<br>

> exponent or sign has difference because of the different 0.0 representation<br>

> in NEON. If anybody know how to handle this 0.0 comparison, that would be<br>

> great.<br>

> Or just use if(a==b || (a==0.0 && b==0.0)) ... but I haven't try this.<br>

<br>

<br>

</span>From some faint memory of my math lessions I had produced code like this to get the smallest floating-point number different from zero:<br>

<br>

double  EPS;            /* smallest number not equal to 0.0 */<br>

<br>

/* refined estimate of EPS */<br>

static  double  get_EPS(double eps)<br>

{<br>

<br>

        while ( 1.0 + eps != 1.0 )<br>

                eps /= 2;<br>

        return(eps);<br>

}<br>

<br>

EPS = get_EPS(1.0);<br>

<br>

On the x86_64 platform I get:<br>

(gdb) p EPS<br>

$1 = 1.1102230246251565e-16<br>

<br>

Maybe it can help...<br>

<br>

Regards,<br>

Ulrich<br>

<div class="gmail-m_-9128358907921410918gmail-m_-3882600981437509126gmail-HOEnZb"><div class="gmail-m_-9128358907921410918gmail-m_-3882600981437509126gmail-h5"><br>

><br>

> Thanks,<br>

> Linfeng<br>

><br>

> On Mon, Jun 5, 2017 at 8:43 PM Jean-Marc Valin <<a href="mailto:jmvalin@jmvalin.ca" target="_blank">jmvalin@jmvalin.ca</a>> wrote:<br>

><br>

>> Hi Linfeng,<br>

>><br>

>> On 05/06/17 03:31 PM, Linfeng Zhang wrote:<br>

>> > Yes we'll have one more patch set related to xcorr in next week. Please<br>

>> > don't wait if it's too late for 1.2 release.<br>

>><br>

>> Assuming there's no issue with the patches, next week isn't too late.<br>

>><br>

>> Also, I've started looking at your patches. So far there's one thing<br>

>> that puzzles me a bit. In the OPUS_CHECK_ASM section of patch 0004, you<br>

>> have:<br>

>><br>

>> +        celt_assert(ABS32(xy1_c - *xy1) <= VERY_SMALL);<br>

>><br>

>> Given the normal range of the values (the xy values are often much<br>

>> larger than one) and the precision involved here (24-bit mantissa), it<br>

>> seems like this test can only succeed if the two values are actually<br>

>> equal. Is the float patch actually bit-exact? If so, then maybe you<br>

>> should be using actual equality. If not, then I guess we need to find<br>

>> the right condition (which isn't obvious for floating point).<br>

>><br>

>> Cheers,<br>

>><br>

>>         Jean-Marc<br>

>><br>

>><br>

>> > Thanks,<br>

>> > Linfeng<br>

>> ><br>

>> > On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <<a href="mailto:linfengz@google.com" target="_blank">linfengz@google.com</a><br>

>> > <mailto:<a href="mailto:linfengz@google.com" target="_blank">linfengz@google.com</a>>> wrote:<br>

>> ><br>

>> >     Hi Jean-Marc,<br>

>> ><br>

>> >     I attached the new version in inner_prod_5patches_v2.zip which<br>

>> >     synced to the current master.<br>

>> ><br>

>> >     For fixed-point ARM, only<br>

>> >     0003-Optimize-fixed-point-cel<wbr>t_inner_prod-and-dual_inner_.p<wbr>atch<br>

>> >     changes the performance.<br>

>> >     For floating-point ARM, only<br>

>> >     0004-Optimize-floating-point-<wbr>celt_inner_prod-and-dual_inn.p<wbr>a<br>

>> >     <<a href="http://elt_inner_prod-and-dual_inn.pa" rel="noreferrer" target="_blank">http://elt_inner_prod-and-du<wbr>al_inn.pa</a>>tch changes the performance.<br>

>> >     Patch 1 and 2 are code clean-up and can only affect x86 performance.<br>

>> >     Patch 5 has neglectable effect on floating-point ARM performance.<br>

>> ><br>

>> >     Thanks,<br>

>> >     Linfeng<br>

>> ><br>

>> >     On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <<a href="mailto:jmvalin@jmvalin.ca" target="_blank">jmvalin@jmvalin.ca</a><br>

>> >     <mailto:<a href="mailto:jmvalin@jmvalin.ca" target="_blank">jmvalin@jmvalin.ca</a>>> wrote:<br>

>> ><br>

>> >         Hi Linfeng,<br>

>> ><br>

>> >         I'll look into your patches. Can you let me know what's the<br>

>> expected<br>

>> >         effect on performance (if any) for each of your patches? Also,<br>

>> >         are these<br>

>> >         all the patches you intend to merge for 1.2 or are there more<br>

>> >         upcoming ones?<br>

>> ><br>

>> >         Cheers,<br>

>> ><br>

>> >                 Jean-Marc<br>

>> ><br>

>> >         On 01/06/17 06:33 PM, Linfeng Zhang wrote:<br>

>> >         > Hi,<br>

>> >         ><br>

>> >         > Attached are 5 patches related to celt_inner_prod()<br>

>> >         > and dual_inner_prod() NEON intrinsics optimization.<br>

>> >         ><br>

>> >         > In<br>

>> >         0004-Optimize-floating-point-<wbr>celt_inner_prod-and-dual_inn.p<wbr>a<br>

>> >         <<a href="http://elt_inner_prod-and-dual_inn.pa" rel="noreferrer" target="_blank">http://elt_inner_prod-and-du<wbr>al_inn.pa</a>>tch, the<br>

>> >         > optimization changed the order of floating-point inner<br>

>> >         products, which<br>

>> >         > will change the results. I<br>

>> >         > created celt_inner_prod_neon_float_c_s<wbr>imulation()<br>

>> >         > and dual_inner_prod_neon_float_c_s<wbr>imulation() to simulate the<br>

>> >         order<br>

>> >         > floating-point operations in NEON optimization and compare<br>

>> their<br>

>> >         > results. Sorry that I cannot bond the distance between<br>

>> original C<br>

>> >         > function and NEON function to any giving reasonable small<br>

>> >         number or<br>

>> >         > ratio. It's easy to create an input which 0 and 1,000 are both<br>

>> >         correct<br>

>> >         > results by just manipulating the inner product order.<br>

>> >         ><br>

>> >         > The total speed gain is about 1.0% for fixed-point encoder,<br>

>> >         and 1.8% for<br>

>> >         > floating-point encoder, in Complexity 8, tested on my<br>

>> Chromebook.<br>

>> >         ><br>

>> >         > Thanks,<br>

>> >         > Linfeng<br>

>> >         ><br>

>> >         ><br>

>> >         > ______________________________<wbr>_________________<br>

>> >         > opus mailing list<br>

>> >         > <a href="mailto:opus@xiph.org" target="_blank">opus@xiph.org</a> <mailto:<a href="mailto:opus@xiph.org" target="_blank">opus@xiph.org</a>><br>

>> >         > <a href="http://lists.xiph.org/mailman/listinfo/opus" rel="noreferrer" target="_blank">http://lists.xiph.org/mailman/<wbr>listinfo/opus</a><br>

>> >         <<a href="http://lists.xiph.org/mailman/listinfo/opus" rel="noreferrer" target="_blank">http://lists.xiph.org/mailma<wbr>n/listinfo/opus</a>><br>

>> >         ><br>

>> ><br>

>> ><br>

>> ><br>

>><br>

<br>

<br>

<br>

</div></div></blockquote></div><br></div></div>