<div dir="ltr"><div><div><div><div>Right, there shouldn&#39;t be a problem with undefined behavior.<br><br></div>That said, a 64 bit implementation will work very well - in fact that&#39;s how it was done originally. <br><br>

</div>The reason for the current implementation is to minimize 64-bit operations in order to improve performance on limited-width architectures.  This functions gets used extensively, and I think the current implementation is faster on a 32 or 16 bit processor.  If you would find the opposite to be true (ie that a 64 bit implementation is faster on, say, a 32 bit ARM CPU) then perhaps we should reconsider.<br>

<br></div>Btw, thanks for your help tracking down that bug!<br></div>koen.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Jun 20, 2014 at 9:46 AM, Jean-Marc Valin <span dir="ltr">&lt;<a href="mailto:jmvalin@jmvalin.ca" target="_blank">jmvalin@jmvalin.ca</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Marcello,<br>

<br>

Actually, we were careful to avoid the undefined behaviour here. In<br>

fact, we are specifically running a clang test detecting undefined<br>

behaviour. If you look at the silk_SMLABB_ovflw() macro, you will see it<br>

is based on silk_ADD32_ovflw(), which is defined as:<br>

<br>

#define silk_ADD32_ovflw(a, b) ((opus_int32)((opus_uint32)(a) +<br>

(opus_uint32)(b)))<br>

<br>

By casting to unsigned, all the cases are well defined. The cast back to<br>

int is implementation-defined but we are already assuming two&#39;s<br>

complement behaviour every time we are shifting.<br>

<br>

As for using the 64-bit version anyway, I&#39;m not sure of the impact since<br>

I&#39;m not the original author of the code. Koen, what are the advantages<br>

of silk_sum_sqr_shift() over the 64-bit version? See any issue with<br>

using one over the other?<br>

<br>

Cheers,<br>

<br>

        Jean-Marc<br>

<br>

On 20/06/14 06:10 AM, Marcello Caramma (mcaramma) wrote:<br>

&gt; Hi Jean-Marc,<br>

&gt;<br>

&gt; well spotted! The patch provided fixes the issue for me.<br>

&gt;<br>

&gt; Nevertheless, in my code (and I would suggest doing the same in libopus) I<br>

&gt; am going to replace the function with one that accumulates on 64 bits and<br>

&gt; then calculates the shift, for at least 4 reasons:<br>

&gt;  - It is less and simpler code<br>

&gt;  - The result is likely to be slightly more accurate in case big numbers<br>

&gt; come early in the array.<br>

&gt;  - I don’t like to have branching inside loops<br>

&gt;<br>

&gt;  - Even though it seems to work on most compilers, assuming that overflow<br>

&gt; in the accumulator will lead to a negative result calls for undefined<br>

&gt; behaviour (NOT implementation defined): if left as is the code should at<br>

&gt; least use an unsigned accumulator and replace the test (nrg &lt; 0) with (nrg<br>

&gt;&gt; MAX_INT32).<br>

&gt;<br>

&gt; Can you see any issue in doing so?<br>

&gt; If you also think it might be a good idea to do the same in the libopus<br>

&gt; repository, I can send you the suggested patch.<br>

&gt;<br>

&gt; Cheers,<br>

&gt;<br>

&gt; Marcello<br>

&gt;<br>

&gt;<br>

&gt; On 18/06/2014 08:09, &quot;Jean-Marc Valin&quot; &lt;<a href="mailto:jmvalin@jmvalin.ca">jmvalin@jmvalin.ca</a>&gt; wrote:<br>

&gt;<br>

&gt;&gt; Hi Marcello,<br>

&gt;&gt;<br>

&gt;&gt; It turns out that the problem has a much simpler explanation. As far as<br>

&gt;&gt; I can tell, it&#39;s actually a bug in silk_sum_sqr_shift() and this trivial<br>

&gt;&gt; patch appears to fix the problem:<br>

&gt;&gt; <a href="http://jmvalin.ca/misc_stuff/sum_sqr_shift_fix.patch" target="_blank">http://jmvalin.ca/misc_stuff/sum_sqr_shift_fix.patch</a><br>

&gt;&gt;<br>

&gt;&gt; It would still require some testing to check that the fix doesn&#39;t have<br>

&gt;&gt; any bad side effect. Let me know how well the fix works for you. Again,<br>

&gt;&gt; thanks for investigating this.<br>

&gt;&gt;<br>

&gt;&gt; Cheers,<br>

&gt;&gt;<br>

&gt;&gt;      Jean-Marc<br>

<div><div class="h5">&gt;&gt;<br>

&gt;&gt; On 13/06/14 12:28 PM, Marcello Caramma (mcaramma) wrote:<br>

&gt;&gt;&gt; Hi Jean Marc,<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; please find attached the audio file (mono 16khz). I shortened it to<br>

&gt;&gt;&gt; about<br>

&gt;&gt;&gt; 10 seconds. I also add 2 patches that worked for me. Further info that<br>

&gt;&gt;&gt; might help:<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; - The problem seems to be related to silk_burg_modified not reaching the<br>

&gt;&gt;&gt; maximum gain, so the actual filter order is 16 rather than 2 (which is<br>

&gt;&gt;&gt; what would be expected with a sine wave).<br>

&gt;&gt;&gt; - The problem seems to happen when rshifts &gt;= 3<br>

&gt;&gt;&gt; - when pre-scaling the signal to be &lt; 16384 the problem goes away (patch<br>

&gt;&gt;&gt; scale_burg_in.diff)<br>

&gt;&gt;&gt; - When calculating C0 and rshifts based on a 64 bits correlation instead<br>

&gt;&gt;&gt; of using silk_sum_sqr_shift the problem also goes away (patch<br>

&gt;&gt;&gt; new_C0_calc.diff)<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; I suspect that for very high prediction gain the fixed point<br>

&gt;&gt;&gt; implementation becomes very sensitive to numerical error, but I am not<br>

&gt;&gt;&gt; too<br>

&gt;&gt;&gt; sure why the new versions work better.<br>

&gt;&gt;&gt; I favour the version with the new C0 calculation, as it avoids rescaling<br>

&gt;&gt;&gt; the input.<br>

&gt;&gt;&gt; I also played around with a version (not attached) that prescales the<br>

&gt;&gt;&gt; input by rshifts/2 - this might be considering as it simplifies the<br>

&gt;&gt;&gt; code.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Regards,<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Marcello<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; PS: I am using 1.1 but the same issue is present with the latest code<br>

&gt;&gt;&gt; well.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

</div></div><div><div class="h5">&gt;&gt;&gt; On 13/06/2014 06:05, &quot;Jean-Marc Valin&quot; &lt;<a href="mailto:jmvalin@jmvalin.ca">jmvalin@jmvalin.ca</a>&gt; wrote:<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Hi Marcello,<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Thanks for the report. It&#39;s hard to debug this without the actual file.<br>

&gt;&gt;&gt;&gt; Can you please post the sweep_in.raw file you used?<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Cheers,<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;    Jean-Marc<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; On 11/06/14 04:46 AM, Marcello Caramma (mcaramma) wrote:<br>

&gt;&gt;&gt;&gt;&gt; Hi,<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; Apologies if this is a known issues, but I have found what I believe<br>

&gt;&gt;&gt;&gt;&gt; is<br>

&gt;&gt;&gt;&gt;&gt; a bug in the fixed point implementation of the Silk codec and could<br>

&gt;&gt;&gt;&gt;&gt; not<br>

&gt;&gt;&gt;&gt;&gt; find any mention on this in the archives.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; The bug can be easily reproduced with the fixed point demo program<br>

&gt;&gt;&gt;&gt;&gt; (./configure ‹enable-fixed-point ‹disable-float-api &amp;&amp; make) using the<br>

&gt;&gt;&gt;&gt;&gt; following command:<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; ./opus_demo voip 16000 1 23000 sweep_in.raw sweep_out.raw<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; Where sweep_in.raw is a 30 seconds full scale frequency sweep from 0<br>

&gt;&gt;&gt;&gt;&gt; to<br>

&gt;&gt;&gt;&gt;&gt; 8kHz sampled at 16kHz.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; The first 6 seconds of audio after transcoding sound Ok. After that<br>

&gt;&gt;&gt;&gt;&gt; artefacts are introduced all the way to the end of the file.<br>

&gt;&gt;&gt;&gt;&gt; The floating point version does not have the issue (even though the<br>

&gt;&gt;&gt;&gt;&gt; quality is subjectively worse roughly from the same point).<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; I believe I narrowed down the problem to the file burg_modified_FIX.c<br>

&gt;&gt;&gt;&gt;&gt; <br>

&gt;&gt;&gt;&gt;&gt; if I make sure the input signal is scaled down to 14 bits before<br>

&gt;&gt;&gt;&gt;&gt; processing the coefficients of the predictor are calculated correctly<br>

&gt;&gt;&gt;&gt;&gt; and no artefact is introduced.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; Is anyone experiencing the same problem or has a proper fix for this?<br>

&gt;&gt;&gt;&gt;&gt; (I<br>

&gt;&gt;&gt;&gt;&gt; can work around the bug with input scaling for now).<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; Thanks and best regards,<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; Marcello Caramma<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; _______________________________________________<br>

&gt;&gt;&gt;&gt;&gt; opus mailing list<br>

&gt;&gt;&gt;&gt;&gt; <a href="mailto:opus@xiph.org">opus@xiph.org</a><br>

</div></div>&gt;&gt;&gt;&gt;&gt; <a href="http://lists.xiph.org/mailman/listinfo/opus" target="_blank">http://lists.xiph.org/mailman/listinfo/opus</a><br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;<br>

</blockquote></div><br></div>