[opus] Bug fix in celt_lpc.c and some xcorr_kernel, optimizations

Tue Jun 11 12:34:45 PDT 2013

Although I've never used ARM's compiler, I admit I'm very surprised that 
it's not compatible with the NEON intrinsics. Given that and M. 
Zanelli's speed tests, it seems clear that M. Zanelli's code is the way 
to go. I look forward to its inclusion in the opus GIT.

--John

On 6/10/2013 1:00 PM, opus-request at xiph.org wrote:
> Date: Mon, 10 Jun 2013 10:36:34 +0100
> From: Cliff Parris<cliff at espico.com>
> Subject: Re: [opus] opus Digest, Vol 53, Issue 2
> To:<opus at xiph.org>
> Message-ID: <A01D0618A28F4E51B5FE299E9394F171 at EspicoPC>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> 	reply-type=original
>
> Hi All,
>
> Regarding cycle measurements for ARM/NEON,
>
> ARM no longer provide cycle accurate simulators. The method we use is to to
> make measurements on hardware via the PMU unit on the core itself. Note that
> if your running under Linux you may be 'allowed' to access the PMU directly
> but can access via it system calls. Typically you will need to make a series
> of measurements and average them.
>
> Re intrinsics, I believe that GCC and ARM's own compiler are not compatible.
> We write directly in ASM since typically neither compilers do what you want.
>
> Cliff

On 6/11/2013 1:00 PM, opus-request at xiph.org wrote:
> Date: Tue, 11 Jun 2013 09:31:31 +0200
> From: Aur?lien Zanelli<aurelien.zanelli at parrot.com>
> Subject: Re: [opus] Bug fix in celt_lpc.c and some xcorr_kernel
> 	optimizations
> To:<opus at xiph.org>
> Message-ID:<51B6D253.9030505 at parrot.com>
> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
>
> Hi,
>
> I compared C version, John's versions and azanelli's version.
>
> I encoded a music file of 247 seconds at an average bitrate of 100kbps
> on a Cortex-A8. Results are:
> - With xcorr_kernel_c(): 26.45s to encode
> - With xcorr_kernel_neon_john1(): 24.86s to encode (~6%)
> - With xcorr_kernel_neon_john2(): 24.4s to encode (~7.5%)
> - With xcorr_kernel_neon_azanelli(): 24.15s to encode (~8.7%)
> These functions have been inlined in pitch_xcorr(), celt_fir() and cel_iir.
>
> Furthermore, the funny thing is that an indirect call to
> xcorr_kernel_azanelli is faster: 23.75s (~10%). However, I didin't test
> the others.
>
> Also i fixed my assembly version to avoid read past "y" buffer and fix
> register garbage when it's inlined.
>
> Best regards,
>
> P.S: I made a mistake so some of my e-mails have not been sent to this
> mailing list. I apologize for this.
>
> -- Aur?lien Zanelli Parrot SA 174, quai de Jemmapes 75010 Paris France