[opus] [Aarch64 00/11] Patches to enable Aarch64
jonathan at vidyo.com
Thu Nov 19 16:18:32 PST 2015
> On Nov 19, 2015, at 5:47 PM, John Ridges <jridges at masque.com> wrote:
> Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT)))
Yes, you’re right. I forgot to run the vectors under qemu with my previous version (oh, the embarrassment!) Fixed forthcoming once the tests actually run.
> On 11/19/2015 2:52 PM, Jonathan Lennox wrote:
>>> On Nov 16, 2015, at 4:42 PM, Jonathan Lennox <jonathan at vidyo.com> wrote:
>>> I haven’t yet tried replacing SIG2WORD16 (or silk_ADD_SAT32/silk_SUB_SAT32) with Neon intrinsics. That’s an obvious next step.
>> This doesn’t show any appreciable speed difference in my tests, but the code is obviously better by inspection (all three of these map directly to a single Aarch64 instruction and a single Neon intrinsic) so my code paths may just not exercise them.
>> Patches follow.
More information about the opus