[opus] Reg an issue with smoothing factor in VAD implementation
Jean-Marc Valin
jmvalin at jmvalin.ca
Mon Nov 27 18:23:44 UTC 2017
On 11/26/2017 11:07 PM, Chandrakala Madhira wrote:
> Can anyone let me know if this is a bug?
It's indeed a bug and we (well, mostly Koen) are looking into it. Might
take some time though.
Cheers,
Jean-Marc
> Thank you,
> Chandrakala
>
> ------------------------------------------------------------------------
> *From: *"Logan Stromberg" <loganstromberg at gmail.com>
> *To: *"Chandrakala Madhira" <chandrakala.madhira at soctronics.com>
> *Cc: *opus at xiph.org
> *Sent: *Wednesday, November 22, 2017 12:12:39 PM
> *Subject: *Re: [opus] Reg an issue with smoothing factor in VAD
> implementation
>
> Yes, yes, I can reproduce it now, but only on platforms that define a
> 16-bit int by default (SA_Q15 is an opus_int rather than opus_int32).
> What system are you compiling this for?
>
> On Tue, Nov 21, 2017 at 8:34 PM, Chandrakala Madhira
> <chandrakala.madhira at soctronics.com
> <mailto:chandrakala.madhira at soctronics.com>> wrote:
>
> Hi Logan,
>
> Please find attached the input stream we are using testing.
>
> Thank you,
> Chandrakala
>
> ------------------------------------------------------------------------
> *From: *"Chandrakala Madhira" <chandrakala.madhira at soctronics.com
> <mailto:chandrakala.madhira at soctronics.com>>
> *To: *"Logan Stromberg" <loganstromberg at gmail.com
> <mailto:loganstromberg at gmail.com>>
> *Cc: *opus at xiph.org <mailto:opus at xiph.org>
> *Sent: *Wednesday, November 22, 2017 9:53:55 AM
>
> *Subject: *Re: [opus] Reg an issue with smoothing factor in VAD
> implementation
>
> Hi Logan,
>
> Below is the configuration I am using. The remaining parameters have
> defaults. The stream is a 16KHz stream. The bitrate selected is
> 16KHz. As the thread have size limitations, I will try to attach the
> stream in next mail. At 125th frame of this stream, the probability
> (SA_Q15) value is 0x0000859c. This value is treated as a negative
> number in step-3.
>
> -e voip 16000 1 16000 -framesize 10 -cvbr Input/dg105_16k.wav
> Output/dg105_16k_16000Fs_mono_16000bps_10ms_vbr_voip.bit
>
> Thank you,
> Chandrakala
>
> Thank you,
> Chandrakala
>
> ------------------------------------------------------------------------
> *From: *"Logan Stromberg" <loganstromberg at gmail.com
> <mailto:loganstromberg at gmail.com>>
> *To: *opus at xiph.org <mailto:opus at xiph.org>
> *Sent: *Tuesday, November 21, 2017 2:38:16 AM
> *Subject: *Re: [opus] Reg an issue with smoothing factor in VAD
> implementation
>
> Just for fun, I tried to reproduce such an overflow. I turned on all
> debug macros, assertions, and checked arithmetic and then encoded 2
> hours of mixed speech/audio with these parameters:
>
> Sample rate = 48000
> Channels = 1
> Application = OPUS_APPLICATION_AUDIO
> Bitrate = 24 KB/s
> Force Mode = MODE_SILK_ONLY
> Signal Type = OPUS_SIGNAL_AUTO
> Complexity = 10
> Frame size = 480 samples (10ms)
>
> No errors came up in encoding. Chandrakala, are these the encoding
> parameters that you believe should trigger the error?
>
> - Logan
>
> Hi,
>
> We are looking at the VAD implementation used in opus. We are
> looking at the code where speech probability is calculated based
> on which SNR is estimated. Below is the part of the code I am
> talking about.
>
> /*********************************/
> /* Speech Probability Estimation */
> /*********************************/
> SA_Q15 = silk_sigm_Q15( silk_SMULWB( VAD_SNR_FACTOR_Q16,
> pSNR_dB_Q7 ) - VAD_NEGATIVE_OFFSET_Q5 ); // step1: Calculate
> speech probability : comment by me
>
> /* Power scaling */
> if( speech_nrg <= 0 ) { // step2: update speech probability
> based on speech energy : comment by me
> SA_Q15 = silk_RSHIFT( SA_Q15, 1 );
> } else if( speech_nrg < 32768 ) {
> if( psEncC->frame_length == 10 * psEncC->fs_kHz ) {
> speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 16 ); // Energy is
> doubled here : comment by me
> } else {
> speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 15 );
> }
>
> /* square-root */
> speech_nrg = silk_SQRT_APPROX( speech_nrg );
> SA_Q15 = silk_SMULWB( 32768 + speech_nrg, SA_Q15 );
> }
>
> /* Smoothing coefficient */
> smooth_coef_Q16 = silk_SMULWB( VAD_SNR_SMOOTH_COEF_Q18,
> silk_SMULWB( (opus_int32)SA_Q15, SA_Q15 ) ); // step3: Update
> the smoothing factor based on speech probability : comment by me
>
> if( psEncC->frame_length == 10 * psEncC->fs_kHz ) {
> smooth_coef_Q16 >>= 1;
> }
>
> Here, in step1, Speech probability is calculated whose value is
> expected to be within [0, 1) in Q15 format. Then based on the
> speech energy levels, in Step2, the probability is updated whose
> value shall also lie between [0, 1). Later in Step3, the smooth
> coeff is calculated. This code do not have any issue when the
> frame size is more than or equal to 20msec. But, if the frame
> size is 10ms, then in step2, the energy is doubled (this may be
> done because the original Silk code is for 20ms. To convert the
> energy for 20ms, it could have been doubled). When this is done
> the probability which is updated in step2 becomes more than 1.
> When this is used in multiplication in Step3, the value is
> treated as a negative number because its a 32x16 multiplication.
> This is will result in a negative smooth coefficient. Please let
> me know if this is a bug.
>
>
> Thank you,
> Chandrakala
>
>
> _______________________________________________
> opus mailing list
> opus at xiph.org <mailto:opus at xiph.org>
> http://lists.xiph.org/mailman/listinfo/opus
>
>
>
>
>
>
> _______________________________________________
> opus mailing list
> opus at xiph.org
> http://lists.xiph.org/mailman/listinfo/opus
>
More information about the opus
mailing list