[opus] Patch cleaning up Opus x86 intrinsics configury

Wed Mar 4 07:00:26 PST 2015

On 3 March 2015 at 22:17, Jonathan Lennox <jonathan at vidyo.com> wrote:
>
> On Mar 3, 2015, at 11:08 PM, Viswanath Puttagunta
> <viswanath.puttagunta at linaro.org> wrote:
>
>
>
> On 3 March 2015 at 21:59, Jonathan Lennox <jonathan at vidyo.com> wrote:
>>
>> Viswenath,
>>
>> My patch should be against the tip, but it’s the very recent tip,
>> including some changes this past Friday (27 Feb).  I mentioned in the IRC
>> room a problem I discovered in creating my patch, and then later improved
>> the fix Tim had made for the problem.  Where do you get conflicts merging it
>> to tip?
>>
>> In terms of merging, you posted your patch before I posted mine, so
>> probably I should be the one on the hook to rebase after your fix goes in.
>> Looking over your patch quickly, I don’t think any of my changes should be
>> that difficult to merge with yours.
>
>>> Yeah.. trivial merge issues.. shouldn't take too long to resolve.
>>
>>
>> I haven’t studied your patch in depth yet.  Does Ne10 use its own RTCD
>> code, or do you use Opus’s?
>
>>> Not using Ne10's RTCD feature at the moment.. using Opus's rtcd. libopus
>>> RTCD for linux/armv7 will not work for aarch64. It needs to be updated (or
>>> just disabled if system built for aarch64 since Neon support is
>>> mandatory)... It is in my todo list..
>
>
> My patch should handle aarch64 properly.  One of the things it does is it
> discovers (in configure) whether a little test program using Neon intrinsics
> can be compiled without -mfpu=neon — if so, it disables RTCD for Neon, and
> sets a #define OPUS_ARM_PRESUME_NEON_INTR.  (This is analogous to the
> existing code for determining whether Neon assembly instructions can be
> assembled by default.)
>
> This thus handles not only aarch64, but also iOS armv7.
This sounds good to me. I wanted to do more thorough code review for
your patch, but am waiting until it applies cleanly on tip.. But now
that we agree that your patch should be rebased on top of my patch,
I'll either wait for Timothy/opus-dev to take in my patch, and your
patch to be rebased? And if this is taking too long for my patch to go
in (Due to opus-dev maintainers time availability), I'll rebase
(retaining your authorship ofcourse) your patch on my submitted
patch... and put in my future work on top of your patch.

But I really hope we can get some cycles from Timothy/Jean to get the
patches into opus. It seems they are really busy for last month or so.

>
> I also have a number of aarch64 patches pending — mine are entirely for
> fixed-point though.  Are yours all floating-point?
Yes, so far they are floating point. Can you post your fixed-point
optimizations? Also, are any of your patches using NE10 library?

So far, what I submitted is to optimize opus_fft using NE10 (float
only). I also modified mdct_forward to take advantage of this
optimization.

My next action plan:
- use opus_ifft to optimize mdct_backwards (float via Ne10)
- Re-work/add opus_fft and opus_ifft for fixed (again via Ne10)
- Enable for Aarch64 and test (I'm hoping your patch already does this)
- float-to-int and int-to-float conversions using neon
- etc.

>
>> On Mar 3, 2015, at 5:18 PM, Viswanath Puttagunta
>> <viswanath.puttagunta at linaro.org> wrote:
>>
>> > Hello Jonathan,
>> >
>> > I am unable to apply your patch cleanly on tip.
>> >
>> > Timothy/opus-dev,
>> >
>> > This patch has some conflicts with my ARM patch that does fft
>> > optimizations
>> > http://lists.xiph.org/pipermail/opus/2015-March/002904.html
>> > http://lists.xiph.org/pipermail/opus/2015-March/002905.html
>> >
>> > One of us probably has to rebase depending on which patch goes into opus
>> > first.
>> >
>> > Regards,
>> > Vish
>> >
>> >
>> > On 1 March 2015 at 20:47, Jonathan Lennox <jonathan at vidyo.com> wrote:
>> >> The attached patch cleans up Opus's x86 intrinsics configury.
>> >>
>> >> It:
>> >> * Makes —enable-intrinsics work with clang and other non-GCC compilers
>> >> * Enables RTCD for the floating-point-mode SSE code in Celt.
>> >> * Disables use of RTCD in cases where the compiler targets an
>> >> instruction
>> >> set by default.
>> >> * Enables the SSE4.1 Silk optimizations that apply to the common parts
>> >> of
>> >> Silk when Opus is built in floating-point mode, not just in fixed-point
>> >> mode.
>> >> * Enables the SSE intrinsics (with RTCD when appropriate) in the Win32
>> >> build.
>> >> * Fixes a case where GCC would compile SSE2 code as SSE4.1, causing a
>> >> crash
>> >> on non-SSE4.1 CPUs.
>> >> * Allows configuration with compilers with non-GCC-flavor flags for
>> >> enabling
>> >> architecture options.
>> >> * Hopefully makes the configuration and ifdef’s easier to follow and
>> >> understand.
>> >>
>> >> This does not yet switch —enable-intrinsics to be enabled by default on
>> >> supported architectures, but I think it’d be ready to do so.
>> >>
>> >> Comments are welcome!
>> >>
>> >>
>> >>
>> >> —
>> >> Jonathan Lennox
>> >> jonathan at vidyo.com
>> >>
>> >>
>> >> _______________________________________________
>> >> opus mailing list
>> >> opus at xiph.org
>> >> http://lists.xiph.org/mailman/listinfo/opus
>> >>
>>
>
>