<div dir="ltr">About 1% speed gain for fixed-point, and 1.5% for floating-point.<div><br></div><div>Thanks,</div><div>Linfeng</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jun 5, 2017 at 12:49 PM, Jean-Marc Valin <span dir="ltr"><<a href="mailto:jmvalin@jmvalin.ca" target="_blank">jmvalin@jmvalin.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 05/06/17 03:28 PM, Linfeng Zhang wrote:<br>

> For fixed-point ARM, only<br>

> 0003-Optimize-fixed-point-<wbr>celt_inner_prod-and-dual_<wbr>inner_.patch changes<br>

> the performance.<br>

> For floating-point ARM, only<br>

> 0004-Optimize-floating-point-<wbr>celt_inner_prod-and-dual_inn.<wbr>patch changes the performance.<br>

<br>

</span>Got any numbers?<br>

<br>

Cheers,<br>

<br>

        Jean-Marc<br>

<span class=""><br>

> Patch 1 and 2 are code clean-up and can only affect x86 performance.<br>

> Patch 5 has neglectable effect on floating-point ARM performance.<br>

><br>

> Thanks,<br>

> Linfeng<br>

><br>

> On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <<a href="mailto:jmvalin@jmvalin.ca">jmvalin@jmvalin.ca</a><br>

</span><div><div class="h5">> <mailto:<a href="mailto:jmvalin@jmvalin.ca">jmvalin@jmvalin.ca</a>>> wrote:<br>

><br>

>     Hi Linfeng,<br>

><br>

>     I'll look into your patches. Can you let me know what's the expected<br>

>     effect on performance (if any) for each of your patches? Also, are these<br>

>     all the patches you intend to merge for 1.2 or are there more<br>

>     upcoming ones?<br>

><br>

>     Cheers,<br>

><br>

>             Jean-Marc<br>

><br>

>     On 01/06/17 06:33 PM, Linfeng Zhang wrote:<br>

>     > Hi,<br>

>     ><br>

>     > Attached are 5 patches related to celt_inner_prod()<br>

>     > and dual_inner_prod() NEON intrinsics optimization.<br>

>     ><br>

>     > In<br>

>     0004-Optimize-floating-point-<wbr>celt_inner_prod-and-dual_inn.<wbr>patch, the<br>

>     > optimization changed the order of floating-point inner products, which<br>

>     > will change the results. I<br>

>     > created celt_inner_prod_neon_float_c_<wbr>simulation()<br>

>     > and dual_inner_prod_neon_float_c_<wbr>simulation() to simulate the order<br>

>     > floating-point operations in NEON optimization and compare their<br>

>     > results. Sorry that I cannot bond the distance between original C<br>

>     > function and NEON function to any giving reasonable small number or<br>

>     > ratio. It's easy to create an input which 0 and 1,000 are both correct<br>

>     > results by just manipulating the inner product order.<br>

>     ><br>

>     > The total speed gain is about 1.0% for fixed-point encoder, and<br>

>     1.8% for<br>

>     > floating-point encoder, in Complexity 8, tested on my Chromebook.<br>

>     ><br>

>     > Thanks,<br>

>     > Linfeng<br>

>     ><br>

>     ><br>

>     > ______________________________<wbr>_________________<br>

>     > opus mailing list<br>

</div></div>>     > <a href="mailto:opus@xiph.org">opus@xiph.org</a> <mailto:<a href="mailto:opus@xiph.org">opus@xiph.org</a>><br>

>     > <a href="http://lists.xiph.org/mailman/listinfo/opus" rel="noreferrer" target="_blank">http://lists.xiph.org/mailman/<wbr>listinfo/opus</a><br>

>     <<a href="http://lists.xiph.org/mailman/listinfo/opus" rel="noreferrer" target="_blank">http://lists.xiph.org/<wbr>mailman/listinfo/opus</a>><br>

>     ><br>

><br>

><br>

</blockquote></div><br></div>