[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics
jmvalin at jmvalin.ca
Tue Jun 6 21:47:23 UTC 2017
Thanks, all 5 patches merged in master.
On 06/06/17 05:04 PM, Linfeng Zhang wrote:
> Thank Jonathan and Jean-Marc!
> I attached the new patch sets in inner_prod_5patches_v3.zip.
> The Chromebook I'm using is
> Chromebook 13
> CB5-311 series
> RMN: Z3ENN
> CPU info:
> $ cat /proc/cpuinfo
> processor: 0
> model name: ARMv7 Processor rev 3 (v7l)
> BogoMIPS: 2.31
> Features: swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4
> idiva idivt vfpd32 lpae
> CPU implementer: 0x41
> CPU architecture: 7
> CPU variant: 0x3
> CPU part: 0xc0f
> CPU revision: 3
> Hardware: NVIDIA Tegra SoC (Flattened Device Tree)
> Revision: 0000
> Serial: 0000000000000000
> On Tue, Jun 6, 2017 at 1:15 PM, Jean-Marc Valin <jmvalin at jmvalin.ca
> <mailto:jmvalin at jmvalin.ca>> wrote:
> Hi Linfeng,
> On 06/06/17 04:09 PM, Jonathan Lennox wrote:
> > Two comments on the various infrastructure for RTCD etc.
> > 1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions,
> > but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s
> > correspondingly. I suspect the ‘arch’ parameter can just be ignored
> > by the assembly functions, but at least the comments in that file
> > should be updated to indicate the register that’s used to pass it in,
> > and that it’s ignored.
> > 2. In the 0003- patch, you shouldn’t use the MAY_HAVE_NEON macro in
> > your new arm_celt_map tables, for the same reason we didn’t want it
> > in the arm_silk_map tables.
> I have no further issues with your patches, so once you address the two
> issues Jonathan pointed out, I'll be able to merge them.
> > Out of curiosity, what’s the CPU in the Chromebook you’re using to
> > test?
> >> On Jun 1, 2017, at 6:33 PM, Linfeng Zhang <linfengz at google.com
> <mailto:linfengz at google.com>>
> >> wrote:
> >> Hi,
> >> Attached are 5 patches related to celt_inner_prod() and
> >> dual_inner_prod() NEON intrinsics optimization.
> >> In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch,
> >> the optimization changed the order of floating-point inner
> >> products, which will change the results. I created
> >> celt_inner_prod_neon_float_c_simulation() and
> >> dual_inner_prod_neon_float_c_simulation() to simulate the order
> >> floating-point operations in NEON optimization and compare their
> >> results. Sorry that I cannot bond the distance between original C
> >> function and NEON function to any giving reasonable small number or
> >> ratio. It's easy to create an input which 0 and 1,000 are both
> >> correct results by just manipulating the inner product order.
> >> The total speed gain is about 1.0% for fixed-point encoder, and
> >> 1.8% for floating-point encoder, in Complexity 8, tested on my
> >> Chromebook.
> >> Thanks, Linfeng
> opus mailing list
> >> opus at xiph.org <mailto:opus at xiph.org>
> > _______________________________________________ opus mailing list
> > opus at xiph.org <mailto:opus at xiph.org>
More information about the opus