<div dir="ltr">Hi Timothy,<div><br></div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Do you think it would be possible to improve the API of xcorr_kernel() so that calling it in a loop is more efficient?<br></blockquote><div><br></div><div>If it could be inlined, it will be more efficient. Besides memory bouncing, frequent function call is expensive.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The other advantage to wiring up xcorr_kernel() is that it applies in more places than your intrinsics-only celt_fir() implementation.<br>

</blockquote></div><br></div><div class="gmail_extra">I agree.</div><div class="gmail_extra"><br></div><div class="gmail_extra"><div>One solution is to put the outer for(N) loop inside xcorr_kernel() to let it return N results instead of 4 (similar to the celt_fir() NEON intrinsics did). This will make it efficient plus universal.</div><div><br></div><div>Thanks,</div><div><br></div></div></div>