[opus] [Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.

Mon Nov 23 09:04:23 PST 2015

Hi Jonathan.

I really, really hate to bring this up this late in the game, but I just 
noticed that your NEON code doesn't use any of the "high" intrinsics for 
ARM64, e.g. instead of:

int32x4_t coef1 = vmovl_s16(vget_high_s16(coef16));

you could use:

int32x4_t coef1 = vmovl_high_s16(coef16);

and instead of:

int64x2_t b1 = vmlal_s32(b0, vget_high_s32(a0), vget_high_s32(coef0));

you could use:

int64x2_t b1 = vmlal_high_s32(b0, a0, coef0);

and instead of:

int64x1_t c = vadd_s64(vget_low_s64(b3), vget_high_s64(b3));
int64x1_t cS = vshr_n_s64(c, 16);
int32x2_t d = vreinterpret_s32_s64(cS);
out = vget_lane_s32(d, 0);

you could use:

out = (opus_int32)(vaddvq_s64(b3) >> 16);

I understand that ARM added these intrinsics because "vget_high_xxx" 
generates an instruction in ARM64, and isn't just free the way it was in 
ARMv7 ("vget_low_xxx" is of course still free on both platforms).

Regards,

John Ridges