[opus] Reg an issue with smoothing factor in VAD implementation

Logan Stromberg loganstromberg at gmail.com
Wed Nov 22 06:42:39 UTC 2017


Yes, yes, I can reproduce it now, but only on platforms that define a
16-bit int by default (SA_Q15 is an opus_int rather than opus_int32). What
system are you compiling this for?

On Tue, Nov 21, 2017 at 8:34 PM, Chandrakala Madhira <
chandrakala.madhira at soctronics.com> wrote:

> Hi Logan,
>
> Please find attached the input stream we are using testing.
>
> Thank you,
> Chandrakala
>
> ------------------------------
> *From: *"Chandrakala Madhira" <chandrakala.madhira at soctronics.com>
> *To: *"Logan Stromberg" <loganstromberg at gmail.com>
> *Cc: *opus at xiph.org
> *Sent: *Wednesday, November 22, 2017 9:53:55 AM
>
> *Subject: *Re: [opus] Reg an issue with smoothing factor in VAD
> implementation
>
> Hi Logan,
>
> Below is the configuration I am using. The remaining parameters have
> defaults. The stream is a 16KHz stream. The bitrate selected is 16KHz. As
> the thread have size limitations, I will try to attach the stream in next
> mail. At 125th frame of this stream, the probability (SA_Q15) value is
> 0x0000859c. This value is treated as a negative number in step-3.
>
> -e voip 16000 1 16000  -framesize 10  -cvbr Input/dg105_16k.wav
> Output/dg105_16k_16000Fs_mono_16000bps_10ms_vbr_voip.bit
>
> Thank you,
> Chandrakala
>
> Thank you,
> Chandrakala
>
> ------------------------------
> *From: *"Logan Stromberg" <loganstromberg at gmail.com>
> *To: *opus at xiph.org
> *Sent: *Tuesday, November 21, 2017 2:38:16 AM
> *Subject: *Re: [opus] Reg an issue with smoothing factor in VAD
> implementation
>
> Just for fun, I tried to reproduce such an overflow. I turned on all debug
> macros, assertions, and checked arithmetic and then encoded 2 hours of
> mixed speech/audio with these parameters:
>
> Sample rate = 48000
> Channels = 1
> Application = OPUS_APPLICATION_AUDIO
> Bitrate = 24 KB/s
> Force Mode = MODE_SILK_ONLY
> Signal Type = OPUS_SIGNAL_AUTO
> Complexity = 10
> Frame size = 480 samples (10ms)
>
> No errors came up in encoding. Chandrakala, are these the encoding
> parameters that you believe should trigger the error?
>
> - Logan
>
> Hi,
>>
>> We are looking at the VAD implementation used in opus. We are looking at
>> the code where speech probability is calculated based on which SNR is
>> estimated. Below is the part of the code I am talking about.
>>
>> /*********************************/
>> /* Speech Probability Estimation */
>> /*********************************/
>> SA_Q15 = silk_sigm_Q15( silk_SMULWB( VAD_SNR_FACTOR_Q16, pSNR_dB_Q7 ) -
>> VAD_NEGATIVE_OFFSET_Q5 ); // step1: Calculate speech probability : comment
>> by me
>>
>> /* Power scaling */
>> if( speech_nrg <= 0 ) { // step2: update speech probability based on
>> speech energy : comment by me
>> SA_Q15 = silk_RSHIFT( SA_Q15, 1 );
>> } else if( speech_nrg < 32768 ) {
>> if( psEncC->frame_length == 10 * psEncC->fs_kHz ) {
>> speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 16 ); // Energy is doubled
>> here : comment by me
>> } else {
>> speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 15 );
>> }
>>
>> /* square-root */
>> speech_nrg = silk_SQRT_APPROX( speech_nrg );
>> SA_Q15 = silk_SMULWB( 32768 + speech_nrg, SA_Q15 );
>> }
>>
>> /* Smoothing coefficient */
>> smooth_coef_Q16 = silk_SMULWB( VAD_SNR_SMOOTH_COEF_Q18, silk_SMULWB(
>> (opus_int32)SA_Q15, SA_Q15 ) ); // step3: Update the smoothing factor based
>> on speech probability : comment by me
>>
>> if( psEncC->frame_length == 10 * psEncC->fs_kHz ) {
>> smooth_coef_Q16 >>= 1;
>> }
>>
>> Here, in step1, Speech probability is calculated whose value is expected
>> to be within [0, 1) in Q15 format. Then based on the speech energy levels,
>> in Step2, the probability is updated whose value shall also lie between [0,
>> 1). Later in Step3, the smooth coeff is calculated. This code do not have
>> any issue when the frame size is more than or equal to 20msec. But, if the
>> frame size is 10ms, then in step2, the energy is doubled (this may be done
>> because the original Silk code is for 20ms. To convert the energy for 20ms,
>> it could have been doubled). When this is done the probability which is
>> updated in step2 becomes more than 1. When this is used in multiplication
>> in Step3, the value is treated as a negative number because its a 32x16
>> multiplication. This is will result in a negative smooth coefficient.
>> Please let me know if this is a bug.
>>
>>
>> Thank you,
>> Chandrakala
>
>
> _______________________________________________
> opus mailing list
> opus at xiph.org
> http://lists.xiph.org/mailman/listinfo/opus
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20171121/ba47653d/attachment-0001.html>


More information about the opus mailing list