<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I believe the solution would be to always have either:<br>

1) USE_CELT_FIR=1 and use ovflw() macros in the xcorr code; or<br>

2) USE_CELT_FIR=0 and no ovflw() in the xcorr code<br></blockquote><div><br></div><div>I prefer to create a function named silk_fir() with optimization to do the calculation when USE_CELT_FIR=0.</div><div><br></div><div>xcorr_kernel() itself is great and provides many gains. The only issue is that calling it in a for loop makes it less efficient.<br></div><div><br></div><div>xcorr_kernel() is called in several functions including celt_fir(), celt_pitch_xcorr() and celt_iir(). All these functions are not heavy hitters.<br></div><div>silk_LPC_analysis_filter()'s CPU cycles are 6.8% with complexity 8 and 8.9% with complexity 5 out of the whole encoder. It probably makes sense to have a specific optimization to not calling xcorr_kernel() too many times to save 1% to 1.5% CPU cycles here.<br></div><div><br></div><div>How do you think?</div><div><br></div><div>Thanks,</div><div>Linfeng</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

We can then switch between the two cases at compile time. I think 2) is<br>

appropriate when we have --enable-assertions without --enable-check-asm,<br>

wheras 1) is appropriate the rest of the time. That way we can test for<br>

overflows in the CELT code, without preventing optimization of the SILK<br>

code.<br>

<span class="gmail-"><br>

> Because of silk_LPC_analysis_filter(), celt_fir_permit_overflow() must<br>

> behave the same for both floating-point and fixed-point, and this is why<br>

> we defined ADD32_FIXED(), ..., PSHR32_FIXED() etc.<br>

<br>

</span>I don't think you will need these anymore, but if you ever need<br>

fixed-point macros that remain integer for float compilation, then you<br>

should use the silk_*() fixed-point macros (and the code should be in<br>

silk/).<br>

<span class="gmail-"><br>

> For the NEON optimization part, the previous celt_fir() optimization<br>

> calls xcorr_kernel(). We tested and found that calling<br>

> the xcorr_kernel() optimization didn't help too much here. The<br>

> optimization in the patch is about 1% faster than simply<br>

> calling xcorr_kernel() for the whole encoder. Considering the really<br>

> small size of the new optimization, it's better to not call<br>

> xcorr_kernel() to get 1% faster.<br>

<br>

</span>It seems like the new Neon code does essentially the same thing as the<br>

existing xcorr_kernel(), so the two should remain shared to reduce<br>

maintenance burden and maximize speed. If you're able to come up with<br>

something faster than xcorr_kernel() (which apparently you have), then<br>

it seems like it should be usable by the other functions that currently<br>

use xcorr_kernel(). Or did I miss anything here?<br>

<br>

Cheers,<br>

<br>

        Jean-Marc<br>

</blockquote></div><br></div></div>