[opus] Antw: Re: [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Mar 2 07:27:05 UTC 2017


I'm not deep i the code, but from my experience even older gcc (4.3.4) does function inlining at -O2, and at -O3 it inlines almost any function inside one module. Once I even let it inline across modules (-combine). I'm not talking about explicit inline functions; just about automatic optimization.
So did you check that frequent function calls actually happen? I'm a bit afraid that after all those optimizations suggested the code may be rather hard to understand. I think compilers should do the dirty work (i.e.: optimizing and inlining). Sometimes "static" and "const" attributes help the compiler to optimize...


>>> Linfeng Zhang <linfengz at google.com> schrieb am 01.03.2017 um 20:30 in Nachricht
<CAKoqLCANyWDPpy4rccL3TJ37gbhWxRWkCrqR9GCATGhTFoaDyA at mail.gmail.com>:
> Hi Timothy,
> Do you think it would be possible to improve the API of xcorr_kernel() so
>> that calling it in a loop is more efficient?
> If it could be inlined, it will be more efficient. Besides memory bouncing,
> frequent function call is expensive.
> The other advantage to wiring up xcorr_kernel() is that it applies in more
>> places than your intrinsics-only celt_fir() implementation.
> I agree.
> One solution is to put the outer for(N) loop inside xcorr_kernel() to let
> it return N results instead of 4 (similar to the celt_fir() NEON intrinsics
> did). This will make it efficient plus universal.
> Thanks,

More information about the opus mailing list