[opus] Make check failure on clone from 31 January
Jean-Marc Valin
jmvalin at jmvalin.ca
Mon Feb 24 12:11:56 PST 2014
Hi Marcello,
On 24/02/14 11:03 AM, Marcello Caramma (mcaramma) wrote:
> After a few experiments, I found that both alternatives are very similar, and 2~5% slower compared to the following:
>
>
> diff --git a/celt/mdct.c b/celt/mdct.c
> index 1634e8e..e490c3b 100644
> --- a/celt/mdct.c
> +++ b/celt/mdct.c
> @@ -277,7 +277,7 @@ void clt_mdct_backward(const mdct_lookup *l, kiss_fft_scalar *in, kiss_fft_scala
> it in-place. */
> {
> kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1);
> - kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2;
> + kiss_fft_scalar * yp1 = yp0+N2-2;
Actually, I'm not quite sure that this is legal (it may pass only
because some compilers have a different interpretation of restrict) and
I suspect the speedup is mostly due to a random change in the code being
emitted rather than a real improvement to the code. Considering that
having a separate "odd case" wasn't faster, I just checked in the
simplest fix that removes the restrict. Thanks for tracking this down.
Cheers,
Jean-Marc
> const kiss_twiddle_scalar *t = &trig[0];
> /* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the
> middle pair will be computed twice. */
>
>
> This should always be safe because by making yp1 directly derived from yp0 the compiler should be able to determine that the addresses could overlap.
> I am not sure why this is faster than removing restrict from both, though. Mysteries of gcc I guess...
>
> I also tried replacing yp1[k] with yp0[k+N2-2-4*i] with k={1,2}, but that seemed to make no measurable difference.
> Also, replacing the loop with a first loop (to N4) for the rotation and a second loop (to N4>>1) for the shuffle made no measurable impact with my compiler (i.e. same as the result after the last suggested patch). It might still be worth considering, as I personally found that easier to read (it also avoids calculating the central value twice for odd N4).
>
> Marcello
>
>
> ________________________________________
> From: Jean-Marc Valin [jmvalin at jmvalin.ca]
> Sent: 22 February 2014 15:56
> To: Marcello Caramma (mcaramma); opus at xiph.org
> Subject: Re: [opus] Make check failure on clone from 31 January
>
> Good catch! I think gcc is just unable to make use of restrict for
> anything useful. Apparently your compiler does, so could you benchmark
> both patches below and see if it makes a difference in terms of speed.
> If it's worth it, I can keep the restrict and add the odd case handling,
> but if not I'll just remove the restrict.
>
> Thanks,
>
> Jean-Marc
>
> On 21/02/14 05:54 PM, Marcello Caramma (mcaramma) wrote:
>> I tracked down the bug to an incorrect use of restrict.
>>
>> I would not consider this a compiler bug: we are lying to the optimizer by
>> telling it that a pointer is restrict when in fact it isn't.
>>
>> This can be fixed like so:
>>
>>
>> diff --git a/celt/mdct.c b/celt/mdct.c
>> index 1634e8e..fa5098c 100644
>> --- a/celt/mdct.c
>> +++ b/celt/mdct.c
>> @@ -276,8 +276,8 @@ void clt_mdct_backward(const mdct_lookup *l,
>> kiss_fft_scalar *in, kiss_fft_scala
>> /* Post-rotate and de-shuffle from both ends of the buffer at once to
>> make
>> it in-place. */
>> {
>> - kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1);
>> - kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2;
>> + kiss_fft_scalar * yp0 = out+(overlap>>1);
>> + kiss_fft_scalar * yp1 = out+(overlap>>1)+N2-2;
>> const kiss_twiddle_scalar *t = &trig[0];
>> /* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the
>> middle pair will be computed twice. */
>>
>>
>> Or like so:
>>
>>
>> diff --git a/celt/mdct.c b/celt/mdct.c
>> index 1634e8e..cdd053f 100644
>> --- a/celt/mdct.c
>> +++ b/celt/mdct.c
>> @@ -279,9 +279,8 @@ void clt_mdct_backward(const mdct_lookup *l,
>> kiss_fft_scalar *in, kiss_fft_scala
>> kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1);
>> kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2;
>> const kiss_twiddle_scalar *t = &trig[0];
>> - /* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the
>> - middle pair will be computed twice. */
>> - for(i=0;i<(N4+1)>>1;i++)
>> + /* Loop to N4>>1 to make sure pointers never overlap */
>> + for(i=0;i<N4>>1;i++)
>> {
>> kiss_fft_scalar re, im, yr, yi;
>> kiss_twiddle_scalar t0, t1;
>> @@ -309,6 +308,20 @@ void clt_mdct_backward(const mdct_lookup *l,
>> kiss_fft_scalar *in, kiss_fft_scala
>> yp0 += 2;
>> yp1 -= 2;
>> }
>> + // Handle the odd case
>> + if(N4 & 1)
>> + {
>> + kiss_fft_scalar re, im;
>> + kiss_twiddle_scalar t0, t1;
>> + /* We swap real and imag because we're using an FFT instead of
>> an IFFT. */
>> + re = yp0[1];
>> + im = yp0[0];
>> + t0 = t[i];
>> + t1 = t[N4+i];
>> + /* We'd scale up by 2 here, but instead it's done when mixing
>> the windows */
>> + yp0[0] = S_MUL(re,t0) + S_MUL(im,t1);
>> + yp0[1] = S_MUL(re,t1) - S_MUL(im,t0);
>> + }
>> }
>>
>> /* Mirror on both sides for TDAC */
>>
>>
>> Regards,
>>
>> Marcello
>>
>>
>>
>> On 05/02/2014 18:46, "Gregory Maxwell" <gmaxwell at gmail.com> wrote:
>>
>>> On Wed, Feb 5, 2014 at 8:05 AM, Marcello Caramma (mcaramma)
>>> <mcaramma at cisco.com> wrote:
>>>> Hi,
>>>>
>>>> Apologies if this is a known issue, but running make on revision
>>>> e3187444692195957eb66989622c7b1ad8448b06 fails one of the tests when
>>>> using fixed point configuration (floating point is ok) on my linux x86.
>>>> Note that libopus1.1, as extracted from the tar ball, is OK.
>>>
>>> I can't reproduce with Fedora 19, gcc version 4.8.2 20131212 (Red Hat
>>> 4.8.2-7) (GCC) compiled with
>>> CFLAGS='-m32 -O2 -g' ./configure --enable-fixed-point ; make clean ;
>>> make celt/tests/test_unit_mdct
>>>
>>> or with Clang 3.5 on the same system.
>>>
>>> None of the fixed point builds on or ci system are throwing errors on
>>> this either.
>>>
>>> Can you try compiling without optimizations? Smells like a compiler bug.
>>
>> _______________________________________________
>> opus mailing list
>> opus at xiph.org
>> http://lists.xiph.org/mailman/listinfo/opus
>>
More information about the opus
mailing list