<div dir="ltr"><div><div><div><div>Right, there shouldn't be a problem with undefined behavior.<br><br></div>That said, a 64 bit implementation will work very well - in fact that's how it was done originally. <br><br>
</div>The reason for the current implementation is to minimize 64-bit operations in order to improve performance on limited-width architectures. This functions gets used extensively, and I think the current implementation is faster on a 32 or 16 bit processor. If you would find the opposite to be true (ie that a 64 bit implementation is faster on, say, a 32 bit ARM CPU) then perhaps we should reconsider.<br>
<br></div>Btw, thanks for your help tracking down that bug!<br></div>koen.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Jun 20, 2014 at 9:46 AM, Jean-Marc Valin <span dir="ltr"><<a href="mailto:jmvalin@jmvalin.ca" target="_blank">jmvalin@jmvalin.ca</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Marcello,<br>
<br>
Actually, we were careful to avoid the undefined behaviour here. In<br>
fact, we are specifically running a clang test detecting undefined<br>
behaviour. If you look at the silk_SMLABB_ovflw() macro, you will see it<br>
is based on silk_ADD32_ovflw(), which is defined as:<br>
<br>
#define silk_ADD32_ovflw(a, b) ((opus_int32)((opus_uint32)(a) +<br>
(opus_uint32)(b)))<br>
<br>
By casting to unsigned, all the cases are well defined. The cast back to<br>
int is implementation-defined but we are already assuming two's<br>
complement behaviour every time we are shifting.<br>
<br>
As for using the 64-bit version anyway, I'm not sure of the impact since<br>
I'm not the original author of the code. Koen, what are the advantages<br>
of silk_sum_sqr_shift() over the 64-bit version? See any issue with<br>
using one over the other?<br>
<br>
Cheers,<br>
<br>
Jean-Marc<br>
<br>
On 20/06/14 06:10 AM, Marcello Caramma (mcaramma) wrote:<br>
> Hi Jean-Marc,<br>
><br>
> well spotted! The patch provided fixes the issue for me.<br>
><br>
> Nevertheless, in my code (and I would suggest doing the same in libopus) I<br>
> am going to replace the function with one that accumulates on 64 bits and<br>
> then calculates the shift, for at least 4 reasons:<br>
> - It is less and simpler code<br>
> - The result is likely to be slightly more accurate in case big numbers<br>
> come early in the array.<br>
> - I don’t like to have branching inside loops<br>
><br>
> - Even though it seems to work on most compilers, assuming that overflow<br>
> in the accumulator will lead to a negative result calls for undefined<br>
> behaviour (NOT implementation defined): if left as is the code should at<br>
> least use an unsigned accumulator and replace the test (nrg < 0) with (nrg<br>
>> MAX_INT32).<br>
><br>
> Can you see any issue in doing so?<br>
> If you also think it might be a good idea to do the same in the libopus<br>
> repository, I can send you the suggested patch.<br>
><br>
> Cheers,<br>
><br>
> Marcello<br>
><br>
><br>
> On 18/06/2014 08:09, "Jean-Marc Valin" <<a href="mailto:jmvalin@jmvalin.ca">jmvalin@jmvalin.ca</a>> wrote:<br>
><br>
>> Hi Marcello,<br>
>><br>
>> It turns out that the problem has a much simpler explanation. As far as<br>
>> I can tell, it's actually a bug in silk_sum_sqr_shift() and this trivial<br>
>> patch appears to fix the problem:<br>
>> <a href="http://jmvalin.ca/misc_stuff/sum_sqr_shift_fix.patch" target="_blank">http://jmvalin.ca/misc_stuff/sum_sqr_shift_fix.patch</a><br>
>><br>
>> It would still require some testing to check that the fix doesn't have<br>
>> any bad side effect. Let me know how well the fix works for you. Again,<br>
>> thanks for investigating this.<br>
>><br>
>> Cheers,<br>
>><br>
>> Jean-Marc<br>
<div><div class="h5">>><br>
>> On 13/06/14 12:28 PM, Marcello Caramma (mcaramma) wrote:<br>
>>> Hi Jean Marc,<br>
>>><br>
>>> please find attached the audio file (mono 16khz). I shortened it to<br>
>>> about<br>
>>> 10 seconds. I also add 2 patches that worked for me. Further info that<br>
>>> might help:<br>
>>><br>
>>> - The problem seems to be related to silk_burg_modified not reaching the<br>
>>> maximum gain, so the actual filter order is 16 rather than 2 (which is<br>
>>> what would be expected with a sine wave).<br>
>>> - The problem seems to happen when rshifts >= 3<br>
>>> - when pre-scaling the signal to be < 16384 the problem goes away (patch<br>
>>> scale_burg_in.diff)<br>
>>> - When calculating C0 and rshifts based on a 64 bits correlation instead<br>
>>> of using silk_sum_sqr_shift the problem also goes away (patch<br>
>>> new_C0_calc.diff)<br>
>>><br>
>>> I suspect that for very high prediction gain the fixed point<br>
>>> implementation becomes very sensitive to numerical error, but I am not<br>
>>> too<br>
>>> sure why the new versions work better.<br>
>>> I favour the version with the new C0 calculation, as it avoids rescaling<br>
>>> the input.<br>
>>> I also played around with a version (not attached) that prescales the<br>
>>> input by rshifts/2 - this might be considering as it simplifies the<br>
>>> code.<br>
>>><br>
>>> Regards,<br>
>>><br>
>>> Marcello<br>
>>><br>
>>> PS: I am using 1.1 but the same issue is present with the latest code<br>
>>> well.<br>
>>><br>
>>><br>
</div></div><div><div class="h5">>>> On 13/06/2014 06:05, "Jean-Marc Valin" <<a href="mailto:jmvalin@jmvalin.ca">jmvalin@jmvalin.ca</a>> wrote:<br>
>>><br>
>>>> Hi Marcello,<br>
>>>><br>
>>>> Thanks for the report. It's hard to debug this without the actual file.<br>
>>>> Can you please post the sweep_in.raw file you used?<br>
>>>><br>
>>>> Cheers,<br>
>>>><br>
>>>> Jean-Marc<br>
>>>><br>
>>>> On 11/06/14 04:46 AM, Marcello Caramma (mcaramma) wrote:<br>
>>>>> Hi,<br>
>>>>><br>
>>>>> Apologies if this is a known issues, but I have found what I believe<br>
>>>>> is<br>
>>>>> a bug in the fixed point implementation of the Silk codec and could<br>
>>>>> not<br>
>>>>> find any mention on this in the archives.<br>
>>>>><br>
>>>>> The bug can be easily reproduced with the fixed point demo program<br>
>>>>> (./configure ‹enable-fixed-point ‹disable-float-api && make) using the<br>
>>>>> following command:<br>
>>>>><br>
>>>>> ./opus_demo voip 16000 1 23000 sweep_in.raw sweep_out.raw<br>
>>>>><br>
>>>>> Where sweep_in.raw is a 30 seconds full scale frequency sweep from 0<br>
>>>>> to<br>
>>>>> 8kHz sampled at 16kHz.<br>
>>>>><br>
>>>>> The first 6 seconds of audio after transcoding sound Ok. After that<br>
>>>>> artefacts are introduced all the way to the end of the file.<br>
>>>>> The floating point version does not have the issue (even though the<br>
>>>>> quality is subjectively worse roughly from the same point).<br>
>>>>><br>
>>>>> I believe I narrowed down the problem to the file burg_modified_FIX.c<br>
>>>>> <br>
>>>>> if I make sure the input signal is scaled down to 14 bits before<br>
>>>>> processing the coefficients of the predictor are calculated correctly<br>
>>>>> and no artefact is introduced.<br>
>>>>><br>
>>>>> Is anyone experiencing the same problem or has a proper fix for this?<br>
>>>>> (I<br>
>>>>> can work around the bug with input scaling for now).<br>
>>>>><br>
>>>>> Thanks and best regards,<br>
>>>>><br>
>>>>> Marcello Caramma<br>
>>>>><br>
>>>>><br>
>>>>> _______________________________________________<br>
>>>>> opus mailing list<br>
>>>>> <a href="mailto:opus@xiph.org">opus@xiph.org</a><br>
</div></div>>>>>> <a href="http://lists.xiph.org/mailman/listinfo/opus" target="_blank">http://lists.xiph.org/mailman/listinfo/opus</a><br>
>>>>><br>
>>><br>
><br>
</blockquote></div><br></div>