[opus] Reg an issue with smoothing factor in VAD implementation

Mon Nov 20 21:08:16 UTC 2017

Just for fun, I tried to reproduce such an overflow. I turned on all debug
macros, assertions, and checked arithmetic and then encoded 2 hours of
mixed speech/audio with these parameters:

Sample rate = 48000
Channels = 1
Application = OPUS_APPLICATION_AUDIO
Bitrate = 24 KB/s
Force Mode = MODE_SILK_ONLY
Signal Type = OPUS_SIGNAL_AUTO
Complexity = 10
Frame size = 480 samples (10ms)

No errors came up in encoding. Chandrakala, are these the encoding
parameters that you believe should trigger the error?

- Logan

Hi,
>
> We are looking at the VAD implementation used in opus. We are looking at
> the code where speech probability is calculated based on which SNR is
> estimated. Below is the part of the code I am talking about.
>
> /*********************************/
> /* Speech Probability Estimation */
> /*********************************/
> SA_Q15 = silk_sigm_Q15( silk_SMULWB( VAD_SNR_FACTOR_Q16, pSNR_dB_Q7 ) -
> VAD_NEGATIVE_OFFSET_Q5 ); // step1: Calculate speech probability : comment
> by me
>
> /* Power scaling */
> if( speech_nrg <= 0 ) { // step2: update speech probability based on
> speech energy : comment by me
> SA_Q15 = silk_RSHIFT( SA_Q15, 1 );
> } else if( speech_nrg < 32768 ) {
> if( psEncC->frame_length == 10 * psEncC->fs_kHz ) {
> speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 16 ); // Energy is doubled
> here : comment by me
> } else {
> speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 15 );
> }
>
> /* square-root */
> speech_nrg = silk_SQRT_APPROX( speech_nrg );
> SA_Q15 = silk_SMULWB( 32768 + speech_nrg, SA_Q15 );
> }
>
> /* Smoothing coefficient */
> smooth_coef_Q16 = silk_SMULWB( VAD_SNR_SMOOTH_COEF_Q18, silk_SMULWB(
> (opus_int32)SA_Q15, SA_Q15 ) ); // step3: Update the smoothing factor based
> on speech probability : comment by me
>
> if( psEncC->frame_length == 10 * psEncC->fs_kHz ) {
> smooth_coef_Q16 >>= 1;
> }
>
> Here, in step1, Speech probability is calculated whose value is expected
> to be within [0, 1) in Q15 format. Then based on the speech energy levels,
> in Step2, the probability is updated whose value shall also lie between [0,
> 1). Later in Step3, the smooth coeff is calculated. This code do not have
> any issue when the frame size is more than or equal to 20msec. But, if the
> frame size is 10ms, then in step2, the energy is doubled (this may be done
> because the original Silk code is for 20ms. To convert the energy for 20ms,
> it could have been doubled). When this is done the probability which is
> updated in step2 becomes more than 1. When this is used in multiplication
> in Step3, the value is treated as a negative number because its a 32x16
> multiplication. This is will result in a negative smooth coefficient.
> Please let me know if this is a bug.
>
>
> Thank you,
> Chandrakala
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20171120/56b6c725/attachment.html>