[opus] Bug fix in celt_lpc.c and some xcorr_kernel optimizations

Fri Jun 7 11:51:20 PDT 2013

On 06/07/2013 02:33 PM, John Ridges wrote:
> I have no doubt that Mr. Zanelli's NEON code is faster, since hand tuned
> assembly is bound to be faster than using intrinsics.

I was mostly curious about comparing vectorization approaches (assuming
the two are different) than exact code.

> However I notice
> that his code can also read past the y buffer.

Yeah we'd need to either fix this or make sure that we add some padding
to the buffers. In practice it's unlikely to even trigger valgrind (it's
on the stack and the uninitialized data ends up being discarded), but
it's definitely not clean and could come back and bite us later.

Cheers,

	Jean-Marc

> Cheers,
> --John
> 
> 
> On 6/6/2013 9:22 PM, Jean-Marc Valin wrote:
>> Hi John,
>>
>> Thanks for the two fixes. They're in git now. Your SSE version seems to
>> also be slightly faster than mine -- probably due the the partial sums.
>> As for the NEON code, it would be good to compare the performance with
>> the code Aurélien Zanelli posted at
>> http://darkosphere.fr/public/0002-Add-optimized-NEON-version-of-celt_fir-celt_iir-and-.patch
>>
>>
>> Cheers,
>>
>>     Jean-Marc
>>
>>
> 
> 
>