[opus] [AArch64 neon intrinsics v4 0/5] Rework Neon intrinsic code for Aarch64 patchset

Timothy B. Terriberry tterribe at xiph.org
Wed Jul 6 22:35:58 UTC 2016

Jonathan Lennox wrote:
> Following Tim's comments, here are my reworked patches for the Neon intrinsic function patches of

After far too long, I've finally landed these patches (including the 
others from the earlier series), with a few changes (mostly implemented 
myself on a long plane flight in the name of expediency):

- Removed all tabs (including those from prior commits merged by Jean-Marc).
- Marked arch unused in the MIPS version of 
- Added #include "SigProc_FIX.h" to NSQ.h to get a definition of 
OPUS_INLINE as well as opus_int32, opus_int, silk_assert(), 
silk_RSHIFT(), etc.
- Added #include "cpu_support.h" to NSQ_neon.h to get a definition of 
- Removed #include "config.h" from NSQ_neon.h: this should be done from 
each .c file (as is the pattern everywhere else).
arm_silk_map.c (and removed the _NEON tag). If we ever get versions of 
these functions for older ARM arches, they have to go in separate files 
(so we can pass them separate C flags), so putting it in the same 
compilation unit as the NEON version is the wrong place. Also, if we 
ever update the architecture list, we don't want to have to go hunting 
all over the source code for these tables, so all of the SILK ones 
should live in the same place (if we ever get any more).
- Made silk_NSQ_noise_shape_feedback_loop() directly return a Q12 
result, instead of having the caller convert from Q11 to Q12. This saves 
an instruction in the NEON version.
- Added some comments to silk_NSQ_noise_shape_feedback_loop_neon() about 
some repeated conversions we could eliminate and the non-bit-exactness 
w.r.t. the C version.
- Made the final right-shift in 
silk_NSQ_noise_shape_feedback_loop_neon() apply a rounding offset (in 
place of the bias that was in the C version), since it was free.
- Made the fallback in silk_NSQ_noise_shape_feedback_loop_neon() for 
orders other than 8 directly invoke the C version instead of duplicating 
the code.
- Fixed the #ifdef logic for xcorr_kernel_neon_fixed to match that of 
celt_pitch_xcorr_float_neon() (i.e., if we somehow get MAY_HAVE_NEON but 
not PRESUME_NEON and not HAVE_RTCD, don't force invoking the NEON version).
- Rebased the OPUS_FAST_INT64 changes (the way this was defined changed 
in February).

This also included the fix to the configure output Jonathan sent to the 
list on June 30th.

More information about the opus mailing list