[opus] [AArch64 neon intrinsics v4 0/5] Rework Neon intrinsic code for Aarch64 patchset
Timothy B. Terriberry
tterribe at xiph.org
Wed Jul 6 22:35:58 UTC 2016
Jonathan Lennox wrote:
> Following Tim's comments, here are my reworked patches for the Neon intrinsic function patches of
After far too long, I've finally landed these patches (including the
others from the earlier series), with a few changes (mostly implemented
myself on a long plane flight in the name of expediency):
- Removed all tabs (including those from prior commits merged by Jean-Marc).
- Marked arch unused in the MIPS version of
- Added #include "SigProc_FIX.h" to NSQ.h to get a definition of
OPUS_INLINE as well as opus_int32, opus_int, silk_assert(),
- Added #include "cpu_support.h" to NSQ_neon.h to get a definition of
- Removed #include "config.h" from NSQ_neon.h: this should be done from
each .c file (as is the pattern everywhere else).
- Moved SILK_NSQ_NOISE_SHAPE_FEEDBACK_LOOP_NEON_IMPL into an
arm_silk_map.c (and removed the _NEON tag). If we ever get versions of
these functions for older ARM arches, they have to go in separate files
(so we can pass them separate C flags), so putting it in the same
compilation unit as the NEON version is the wrong place. Also, if we
ever update the architecture list, we don't want to have to go hunting
all over the source code for these tables, so all of the SILK ones
should live in the same place (if we ever get any more).
- Made silk_NSQ_noise_shape_feedback_loop() directly return a Q12
result, instead of having the caller convert from Q11 to Q12. This saves
an instruction in the NEON version.
- Added some comments to silk_NSQ_noise_shape_feedback_loop_neon() about
some repeated conversions we could eliminate and the non-bit-exactness
w.r.t. the C version.
- Made the final right-shift in
silk_NSQ_noise_shape_feedback_loop_neon() apply a rounding offset (in
place of the bias that was in the C version), since it was free.
- Made the fallback in silk_NSQ_noise_shape_feedback_loop_neon() for
orders other than 8 directly invoke the C version instead of duplicating
- Fixed the #ifdef logic for xcorr_kernel_neon_fixed to match that of
celt_pitch_xcorr_float_neon() (i.e., if we somehow get MAY_HAVE_NEON but
not PRESUME_NEON and not HAVE_RTCD, don't force invoking the NEON version).
- Rebased the OPUS_FAST_INT64 changes (the way this was defined changed
This also included the fix to the configure output Jonathan sent to the
list on June 30th.
More information about the opus