<div dir="ltr">Thank Jonathan!<div><br></div><div>I'll fix the <span style="font-size:12.8px">MAY_HAVE_NEON() in silk/arm/arm_silk_map.c</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Linfeng</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jun 1, 2017 at 3:34 PM, Jonathan Lennox <span dir="ltr"><<a href="mailto:jonathan@vidyo.com" target="_blank">jonathan@vidyo.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">
<div>Semantically, OPUS_ARM_MAY_HAVE_NEON is supposed to mean the compiler supports, and the CPU may support, Neon assembly code, which isn’t necessarily the same thing as the compiler supporting Neon intrinsics. (The Visual Studio ARM compiler, for
instance, supports intrinsics but not assembly.) So I don’t think this patch is the right solution.</div>
<div><br>
</div>
<div>Instead, I think the problem is actually that silk/arm/arm_silk_map.c uses the MAY_HAVE_NEON macro, which it shouldn’t be using. If that file were changed so that the jump tables just listed the _neon versions of the functions directly, you’d
get the speedup you’re looking for.</div><div><div class="h5">
<div><br>
</div>
<br>
<div>
<blockquote type="cite">
<div>On Jun 1, 2017, at 6:03 PM, Linfeng Zhang <<a href="mailto:linfengz@google.com" target="_blank">linfengz@google.com</a>> wrote:</div>
<br class="m_2300583727823673013Apple-interchange-newline">
<div>
<div dir="ltr">Thank Jean-Mark and Jonathan!
<div><br>
</div>
<div>I tested current OPUS encoder in floating-point with Complexity 8. Hacking using the attached patch (which will generate "#define OPUS_ARM_MAY_HAVE_NEON 1" in config.h) will speed up about 14.7% on my Chromebook. Probably it's because many NEON
intrinsics optimizations can benefit both fixed-point and floating-point encoder.</div>
<div><br>
</div>
<div>So if it's safe enough to enable MAY_HAVE_NEON in floating-point by default, it could speed up floating-point NEON encoder a little bit.</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Linfeng</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jun 1, 2017 at 2:22 PM, Jonathan Lennox <span dir="ltr">
<<a href="mailto:jonathan@vidyo.com" target="_blank">jonathan@vidyo.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">
<div style="word-wrap:break-word">
<div>
<div class="m_2300583727823673013h5">
<div>
<blockquote type="cite">
<div><br>
On May 31, 2017, at 12:47 PM, Linfeng Zhang <<a href="mailto:linfengz@google.com" target="_blank">linfengz@google.com</a>> wrote:</div>
<br class="m_2300583727823673013m_-8233522856382135630Apple-interchange-newline">
<div>
<div dir="ltr">
<div style="font-size:12.8px">Hi,</div>
<div style="font-size:12.8px"><br>
</div>
<div style="font-size:12.8px"><span style="font-size:10pt;font-family:arial">./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf --disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3 --disable-shared</span><br>
</div>
<div style="font-size:12.8px"><span style="font-size:10pt;font-family:arial"><br>
</span></div>
<div style="font-size:12.8px">When configuring with floating-point and intrinsics enabled as above, the generated <span style="font-size:12.8px">config.h only has OPUS_ARM_MAY_HAVE_NEON_INTR defined (to 1), with</span></div>
<div style="font-size:12.8px">
<div>
<div>/* #<span class="m_2300583727823673013m_-8233522856382135630gmail-il">undef</span> OPUS_ARM_ASM */</div>
<div>/* #<span class="m_2300583727823673013m_-8233522856382135630gmail-il">undef</span> OPUS_ARM_INLINE_ASM */<br>
</div>
<div>/* #<span class="m_2300583727823673013m_-8233522856382135630gmail-il">undef</span> OPUS_ARM_INLINE_EDSP */<br>
</div>
<div>/* #<span class="m_2300583727823673013m_-8233522856382135630gmail-il">undef</span> OPUS_ARM_INLINE_MEDIA */<br>
</div>
<div>/* #<span class="m_2300583727823673013m_-8233522856382135630gmail-il">undef</span> OPUS_ARM_INLINE_NEON */<br>
</div>
<div>/* #<span class="m_2300583727823673013m_-8233522856382135630gmail-il">undef</span> OPUS_ARM_MAY_HAVE_EDSP */<br>
</div>
<div>/* #<span class="m_2300583727823673013m_-8233522856382135630gmail-il">undef</span> OPUS_ARM_MAY_HAVE_MEDIA */<br>
</div>
<div>/* #<span class="m_2300583727823673013m_-8233522856382135630gmail-il">undef</span> OPUS_ARM_MAY_HAVE_NEON */<br>
</div>
</div>
<div>
<div>/* #<span class="m_2300583727823673013m_-8233522856382135630gmail-il">undef</span> OPUS_ARM_PRESUME_AARCH6<wbr>4_NEON_INTR */</div>
<div>/* #<span class="m_2300583727823673013m_-8233522856382135630gmail-il">undef</span> OPUS_ARM_PRESUME_EDSP */<br>
</div>
<div>/* #<span class="m_2300583727823673013m_-8233522856382135630gmail-il">undef</span> <span class="m_2300583727823673013m_-8233522856382135630gmail-il">OPUS_ARM_PRESUME_MEDIA</span> <wbr>*/<br>
</div>
<div>/* #<span class="m_2300583727823673013m_-8233522856382135630gmail-il">undef</span> OPUS_ARM_PRESUME_NEON */<br>
</div>
<div>/* #<span class="m_2300583727823673013m_-8233522856382135630gmail-il">undef</span> OPUS_ARM_PRESUME_NEON_I<wbr>NTR */<br>
</div>
</div>
</div>
<div style="font-size:12.8px"><br>
</div>
<div style="font-size:12.8px"><font>So MAY_HAVE_NEON will be defined to <span class="m_2300583727823673013m_-8233522856382135630gmail-il">MEDIA</span> version, which will eventually fall down to C functions in the jump table:<br>
</font></div>
<div style="font-size:12.8px"><font># define MAY_HAVE_NEON(name) MAY_HAVE_MEDIA(name)<br>
</font></div>
<div style="font-size:12.8px"><font><br>
</font></div>
<div style="font-size:12.8px"><font>Therefore all NEON intrinsics optimizations in their jump tables won't get called for floating-point.</font></div>
<div style="font-size:12.8px"><br>
</div>
<div style="font-size:12.8px">Am I missing some options in my configure command, or the config is intend to do so in floating-point?</div>
<div style="font-size:12.8px"><br>
</div>
<div style="font-size:12.8px">Thanks,</div>
<div style="font-size:12.8px">Linfeng</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<div>The structure of this is pretty tangled and confusing, but what you’ll find is that the MAY_HAVE_NEON macro isn’t used in the jump tables for the two Neon intrinsics functions (silk_NSQ_noise_shape_feedback<wbr>_loop_neon and celt_pitch_xcorr_float_neo<wbr>n)
which are used in a floating-point neon build. See silk/arm/arm_silk_map.c and celt/arm/arm_celt_map.c.</div>
<div><br>
</div>
<div>So long as OPUS_ARM_MAY_HAVE_NEON_INTR and OPUS_HAVE_RTCD are set in config.h, it’ll pick up those functions, and check for them using RTCD.</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div></div></div>
</blockquote></div><br></div>