[opus] [RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Viswanath Puttagunta
viswanath.puttagunta at linaro.org
Thu Dec 18 07:38:50 PST 2014
Hi Timothy,
I responded to your feedback before I started on RFCv3.. and took your
silence as approval :).. I guess that email got lost in your inbox sea
some where.. so re-posting the responses.
So, this time, I will wait for your ack before I proceed to RFCv4..
Although.. after all these reviews.. I may just submit it as PATCHv1..
since the patch is almost there and there are no major objections any
more..
Regards,
Vish
On 17 December 2014 at 23:59, Timothy B. Terriberry <tterribe at xiph.org> wrote:
>
> Almost there... just a few nits left.
>
> Viswanath Puttagunta wrote:
> > +if OPUS_ARM_NEON_INTR
> > +CELT_SOURCES += $(CELT_SOURCES_ARM_NEON_INTR)
> > +OPUS_ARM_NEON_INTR_CPPFLAGS = -mfpu=neon -O3
>
> I'll repeat: I don't think you should change the optimization level here.
After you feedback, I moved this part from configure.ac to
Makefile.am.. If it doesn't belong here either, then I need some
guidance. I wouldn't know where else to put this. Without
"-mfpu=neon", the intrinsic code wouldn't even compile. And without
-O3.. the instructions that are generated are just painful. Please
advise.
>
> > + /* Just unroll the rest of the loop */
>
> I saw you decided to keep this unrolled, but you didn't actually answer
> my question.
Copy paste from my previous email.
I got the idea of unrolling this from your earlier comment
"This load is always redundant in the first iteration, which is a bit
unfortunate."
Now, that I unrolled it and tested it to make sure it works, is there
a specific reason you think this is will be any slower than a simple
loop and want to go back? I know it saves one load as compared to a
simple loop. It would be hard to prove that it is measurably better
than the simple loop. But I would sincerely prefer it this way.. It is
fairly straight forward unroll in my opinion.
>
> > -#elif defined(OPUS_ARM_ASM) && defined(FIXED_POINT)
> > +#endif
> > +
> > +#if defined(OPUS_ARM_NEON_INTR)
> > +#include "arm/celt_neon_intr.c"
> > +#endif
> > +
> > +#if ((defined(OPUS_ARM_ASM) && defined(FIXED_POINT)) \
> > + || defined(OPUS_ARM_NEON_INTR))
> > #include "arm/arm_celt_map.c"
> > #endif
>
> You should keep the #elif (the intent was to _guarantee_ that only one
> of x86_celt_map.c or arm_celt_map.c would be included). Instead, just
> move the #if defined(OPUS_ARM_NEON_INTR) block inside that #elif (both
> in test_unit_mathops.c and test_unit_rotation.c).
Will do
> _______________________________________________
> opus mailing list
> opus at xiph.org
> http://lists.xiph.org/mailman/listinfo/opus
More information about the opus
mailing list