[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

Tue Jun 6 21:47:23 UTC 2017

Thanks, all 5 patches merged in master.

	Jean-Marc

On 06/06/17 05:04 PM, Linfeng Zhang wrote:
> Thank Jonathan and Jean-Marc!
> 
> I attached the new patch sets in inner_prod_5patches_v3.zip.
> 
> The Chromebook I'm using is
> Chromebook 13
> CB5-311 series
> RMN: Z3ENN
> 
> CPU info:
> 
> $ cat /proc/cpuinfo
> processor: 0
> model name: ARMv7 Processor rev 3 (v7l)
> BogoMIPS: 2.31
> Features: swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4
> idiva idivt vfpd32 lpae 
> CPU implementer: 0x41
> CPU architecture: 7
> CPU variant: 0x3
> CPU part: 0xc0f
> CPU revision: 3
> 
> Hardware: NVIDIA Tegra SoC (Flattened Device Tree)
> Revision: 0000
> Serial: 0000000000000000
> 
> Thanks,
> Linfeng
> 
> On Tue, Jun 6, 2017 at 1:15 PM, Jean-Marc Valin <jmvalin at jmvalin.ca
> <mailto:jmvalin at jmvalin.ca>> wrote:
> 
>     Hi Linfeng,
> 
>     On 06/06/17 04:09 PM, Jonathan Lennox wrote:
>     > Two comments on the various infrastructure for RTCD etc.
>     >
>     > 1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions,
>     > but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s
>     > correspondingly.  I suspect the ‘arch’ parameter can just be ignored
>     > by the assembly functions, but at least the comments in that file
>     > should be updated to indicate the register that’s used to pass it in,
>     > and that it’s ignored.
>     >
>     > 2. In the 0003- patch, you shouldn’t use the MAY_HAVE_NEON macro in
>     > your new arm_celt_map tables, for the same reason we didn’t want it
>     > in the arm_silk_map tables.
> 
>     I have no further issues with your patches, so once you address the two
>     issues Jonathan pointed out, I'll be able to merge them.
> 
>     Cheers,
> 
>             Jean-Marc
> 
>     >
>     > Out of curiosity, what’s the CPU in the Chromebook you’re using to
>     > test?
>     >
>     >> On Jun 1, 2017, at 6:33 PM, Linfeng Zhang <linfengz at google.com
>     <mailto:linfengz at google.com>>
>     >> wrote:
>     >>
>     >> Hi,
>     >>
>     >> Attached are 5 patches related to celt_inner_prod() and
>     >> dual_inner_prod() NEON intrinsics optimization.
>     >>
>     >> In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch,
>     >> the optimization changed the order of floating-point inner
>     >> products, which will change the results. I created
>     >> celt_inner_prod_neon_float_c_simulation() and
>     >> dual_inner_prod_neon_float_c_simulation() to simulate the order
>     >> floating-point operations in NEON optimization and compare their
>     >> results. Sorry that I cannot bond the distance between original C
>     >> function and NEON function to any giving reasonable small number or
>     >> ratio. It's easy to create an input which 0 and 1,000 are both
>     >> correct results by just manipulating the inner product order.
>     >>
>     >> The total speed gain is about 1.0% for fixed-point encoder, and
>     >> 1.8% for floating-point encoder, in Complexity 8, tested on my
>     >> Chromebook.
>     >>
>     >> Thanks, Linfeng
>     >>
>     <0005-Clean-celt_pitch_xcorr_float_neon.patch><0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch><0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch><0002-Replace-call-of-celt_inner_prod_c-step-2.patch><0001-Replace-call-of-celt_inner_prod_c-step-1.patch>_______________________________________________
>     >>
>     >>
>     opus mailing list
>     >> opus at xiph.org <mailto:opus at xiph.org>
>     http://lists.xiph.org/mailman/listinfo/opus
>     <http://lists.xiph.org/mailman/listinfo/opus>
>     >
>     > _______________________________________________ opus mailing list
>     > opus at xiph.org <mailto:opus at xiph.org>
>     http://lists.xiph.org/mailman/listinfo/opus
>     <http://lists.xiph.org/mailman/listinfo/opus>
>     >
> 
>