[opus] Alleged bug in Silk codec

Fri Jun 20 09:46:39 PDT 2014

Hi Marcello,

Actually, we were careful to avoid the undefined behaviour here. In
fact, we are specifically running a clang test detecting undefined
behaviour. If you look at the silk_SMLABB_ovflw() macro, you will see it
is based on silk_ADD32_ovflw(), which is defined as:

#define silk_ADD32_ovflw(a, b) ((opus_int32)((opus_uint32)(a) +
(opus_uint32)(b)))

By casting to unsigned, all the cases are well defined. The cast back to
int is implementation-defined but we are already assuming two's
complement behaviour every time we are shifting.

As for using the 64-bit version anyway, I'm not sure of the impact since
I'm not the original author of the code. Koen, what are the advantages
of silk_sum_sqr_shift() over the 64-bit version? See any issue with
using one over the other?

Cheers,

	Jean-Marc

On 20/06/14 06:10 AM, Marcello Caramma (mcaramma) wrote:
> Hi Jean-Marc,
> 
> well spotted! The patch provided fixes the issue for me.
> 
> Nevertheless, in my code (and I would suggest doing the same in libopus) I
> am going to replace the function with one that accumulates on 64 bits and
> then calculates the shift, for at least 4 reasons:
>  - It is less and simpler code
>  - The result is likely to be slightly more accurate in case big numbers
> come early in the array.
>  - I don’t like to have branching inside loops
> 
>  - Even though it seems to work on most compilers, assuming that overflow
> in the accumulator will lead to a negative result calls for undefined
> behaviour (NOT implementation defined): if left as is the code should at
> least use an unsigned accumulator and replace the test (nrg < 0) with (nrg
>> MAX_INT32).
> 
> Can you see any issue in doing so?
> If you also think it might be a good idea to do the same in the libopus
> repository, I can send you the suggested patch.
> 
> Cheers,
> 
> Marcello
> 
> 
> On 18/06/2014 08:09, "Jean-Marc Valin" <jmvalin at jmvalin.ca> wrote:
> 
>> Hi Marcello,
>>
>> It turns out that the problem has a much simpler explanation. As far as
>> I can tell, it's actually a bug in silk_sum_sqr_shift() and this trivial
>> patch appears to fix the problem:
>> http://jmvalin.ca/misc_stuff/sum_sqr_shift_fix.patch
>>
>> It would still require some testing to check that the fix doesn't have
>> any bad side effect. Let me know how well the fix works for you. Again,
>> thanks for investigating this.
>>
>> Cheers,
>>
>> 	Jean-Marc
>>
>> On 13/06/14 12:28 PM, Marcello Caramma (mcaramma) wrote:
>>> Hi Jean Marc,
>>>
>>> please find attached the audio file (mono 16khz). I shortened it to
>>> about
>>> 10 seconds. I also add 2 patches that worked for me. Further info that
>>> might help:
>>>
>>> - The problem seems to be related to silk_burg_modified not reaching the
>>> maximum gain, so the actual filter order is 16 rather than 2 (which is
>>> what would be expected with a sine wave).
>>> - The problem seems to happen when rshifts >= 3
>>> - when pre-scaling the signal to be < 16384 the problem goes away (patch
>>> scale_burg_in.diff)
>>> - When calculating C0 and rshifts based on a 64 bits correlation instead
>>> of using silk_sum_sqr_shift the problem also goes away (patch
>>> new_C0_calc.diff)
>>>
>>> I suspect that for very high prediction gain the fixed point
>>> implementation becomes very sensitive to numerical error, but I am not
>>> too
>>> sure why the new versions work better.
>>> I favour the version with the new C0 calculation, as it avoids rescaling
>>> the input.
>>> I also played around with a version (not attached) that prescales the
>>> input by rshifts/2 - this might be considering as it simplifies the
>>> code.
>>>
>>> Regards,
>>>
>>> Marcello
>>>
>>> PS: I am using 1.1 but the same issue is present with the latest code
>>> well.
>>>
>>>
>>> On 13/06/2014 06:05, "Jean-Marc Valin" <jmvalin at jmvalin.ca> wrote:
>>>
>>>> Hi Marcello,
>>>>
>>>> Thanks for the report. It's hard to debug this without the actual file.
>>>> Can you please post the sweep_in.raw file you used?
>>>>
>>>> Cheers,
>>>>
>>>> 	Jean-Marc
>>>>
>>>> On 11/06/14 04:46 AM, Marcello Caramma (mcaramma) wrote:
>>>>> Hi,
>>>>>
>>>>> Apologies if this is a known issues, but I have found what I believe
>>>>> is
>>>>> a bug in the fixed point implementation of the Silk codec and could
>>>>> not
>>>>> find any mention on this in the archives.
>>>>>
>>>>> The bug can be easily reproduced with the fixed point demo program
>>>>> (./configure ‹enable-fixed-point ‹disable-float-api && make) using the
>>>>> following command:
>>>>>
>>>>> ./opus_demo voip 16000 1 23000 sweep_in.raw sweep_out.raw
>>>>>
>>>>> Where sweep_in.raw is a 30 seconds full scale frequency sweep from 0
>>>>> to
>>>>> 8kHz sampled at 16kHz.
>>>>>
>>>>> The first 6 seconds of audio after transcoding sound Ok. After that
>>>>> artefacts are introduced all the way to the end of the file.
>>>>> The floating point version does not have the issue (even though the
>>>>> quality is subjectively worse roughly from the same point).
>>>>>
>>>>> I believe I narrowed down the problem to the file burg_modified_FIX.c
>>>>> 
>>>>> if I make sure the input signal is scaled down to 14 bits before
>>>>> processing the coefficients of the predictor are calculated correctly
>>>>> and no artefact is introduced.
>>>>>
>>>>> Is anyone experiencing the same problem or has a proper fix for this?
>>>>> (I
>>>>> can work around the bug with input scaling for now).
>>>>>
>>>>> Thanks and best regards,
>>>>>
>>>>> Marcello Caramma
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> opus mailing list
>>>>> opus at xiph.org
>>>>> http://lists.xiph.org/mailman/listinfo/opus
>>>>>
>>>
>