<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I believe the solution would be to always have either:<br>
1) USE_CELT_FIR=1 and use ovflw() macros in the xcorr code; or<br>
2) USE_CELT_FIR=0 and no ovflw() in the xcorr code<br></blockquote><div><br></div><div>I prefer to create a function named silk_fir() with optimization to do the calculation when USE_CELT_FIR=0.</div><div><br></div><div>xcorr_kernel() itself is great and provides many gains. The only issue is that calling it in a for loop makes it less efficient.<br></div><div><br></div><div>xcorr_kernel() is called in several functions including celt_fir(), celt_pitch_xcorr() and celt_iir(). All these functions are not heavy hitters.<br></div><div>silk_LPC_analysis_filter()'s CPU cycles are 6.8% with complexity 8 and 8.9% with complexity 5 out of the whole encoder. It probably makes sense to have a specific optimization to not calling xcorr_kernel() too many times to save 1% to 1.5% CPU cycles here.<br></div><div><br></div><div>How do you think?</div><div><br></div><div>Thanks,</div><div>Linfeng</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
We can then switch between the two cases at compile time. I think 2) is<br>
appropriate when we have --enable-assertions without --enable-check-asm,<br>
wheras 1) is appropriate the rest of the time. That way we can test for<br>
overflows in the CELT code, without preventing optimization of the SILK<br>
code.<br>
<span class="gmail-"><br>
> Because of silk_LPC_analysis_filter(), celt_fir_permit_overflow() must<br>
> behave the same for both floating-point and fixed-point, and this is why<br>
> we defined ADD32_FIXED(), ..., PSHR32_FIXED() etc.<br>
<br>
</span>I don't think you will need these anymore, but if you ever need<br>
fixed-point macros that remain integer for float compilation, then you<br>
should use the silk_*() fixed-point macros (and the code should be in<br>
silk/).<br>
<span class="gmail-"><br>
> For the NEON optimization part, the previous celt_fir() optimization<br>
> calls xcorr_kernel(). We tested and found that calling<br>
> the xcorr_kernel() optimization didn't help too much here. The<br>
> optimization in the patch is about 1% faster than simply<br>
> calling xcorr_kernel() for the whole encoder. Considering the really<br>
> small size of the new optimization, it's better to not call<br>
> xcorr_kernel() to get 1% faster.<br>
<br>
</span>It seems like the new Neon code does essentially the same thing as the<br>
existing xcorr_kernel(), so the two should remain shared to reduce<br>
maintenance burden and maximize speed. If you're able to come up with<br>
something faster than xcorr_kernel() (which apparently you have), then<br>
it seems like it should be usable by the other functions that currently<br>
use xcorr_kernel(). Or did I miss anything here?<br>
<br>
Cheers,<br>
<br>
Jean-Marc<br>
</blockquote></div><br></div></div>