[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics
jonathan at vidyo.com
Tue Jun 6 20:09:04 UTC 2017
Two comments on the various infrastructure for RTCD etc.
1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions, but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s correspondingly. I suspect the ‘arch’ parameter can just be ignored by the assembly functions, but at least the comments in that file should be updated to indicate the register that’s used to pass it in, and that it’s ignored.
2. In the 0003- patch, you shouldn’t use the MAY_HAVE_NEON macro in your new arm_celt_map tables, for the same reason we didn’t want it in the arm_silk_map tables.
Out of curiosity, what’s the CPU in the Chromebook you’re using to test?
> On Jun 1, 2017, at 6:33 PM, Linfeng Zhang <linfengz at google.com> wrote:
> Attached are 5 patches related to celt_inner_prod() and dual_inner_prod() NEON intrinsics optimization.
> In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, the optimization changed the order of floating-point inner products, which will change the results. I created celt_inner_prod_neon_float_c_simulation() and dual_inner_prod_neon_float_c_simulation() to simulate the order floating-point operations in NEON optimization and compare their results. Sorry that I cannot bond the distance between original C function and NEON function to any giving reasonable small number or ratio. It's easy to create an input which 0 and 1,000 are both correct results by just manipulating the inner product order.
> The total speed gain is about 1.0% for fixed-point encoder, and 1.8% for floating-point encoder, in Complexity 8, tested on my Chromebook.
> opus mailing list
> opus at xiph.org
More information about the opus