[opus] 2 patches related to silk_biquad_alt() optimization
jmvalin at jmvalin.ca
Tue Apr 25 00:52:43 UTC 2017
On 24/04/17 08:03 PM, Linfeng Zhang wrote:
> Tested on my chromebook, when stride (channel) == 1, the optimization
> has no gain compared with C function.
You mean that the Neon code is the same speed as the C code for
stride==1? This is not terribly surprising for an IIRC filter.
> When stride (channel) == 2, the optimization is 1.2%-1.8% faster (1.6%
> at Complexity 8) compared with C function.
Is that gain due to Neon or simply due to computing two channels in
parallel? For example, if you make a special case in the C code to
handle both channels in the same loop, what kind of performance do you get?
> Please let me know and I can remove the optimization of stride 1 case.
Yeah, if there's Neon code that provides no improvement over C, let's
stick with C. And if you manage to write C code that has the same
performance as the Neon code, then that would also be better (both
easier to maintain and more portable).
> If it's allowed to skip the split of A_Q28 and replace by 32-bit
> multiplication (result is 64-bit), probably it could be faster on NEON.
> This may change the encoder results because of different order of
> adding, shifting and rounding.
I'm not sure what you mean for that.
> On Wed, Apr 19, 2017 at 10:23 PM, Jean-Marc Valin <jmvalin at jmvalin.ca
> <mailto:jmvalin at jmvalin.ca>> wrote:
> Hi Linfeng,
> Thanks for the patches. I'll have a look and get back to you. What kind
> of speedup are you getting for these functions? On what command line?
> On 19/04/17 12:29 PM, Linfeng Zhang wrote:
> > Hi,
> > Attached are 2 patches related to silk_biquad_alt() optimization.
> > review.
> > Thanks,
> > Linfeng Zhang
> > _______________________________________________
> > opus mailing list
> > opus at xiph.org <mailto:opus at xiph.org>
> > http://lists.xiph.org/mailman/listinfo/opus
More information about the opus