[opus] Reg an issue with smoothing factor in VAD implementation

Koen Vos koenvos74 at gmail.com
Sat Feb 17 07:34:37 UTC 2018


Thanks Jean-Marc,

Yes this makes sense: the scaling is preserved for both framelengths, and
the maximum value for speech_nrg going into the square-root is now correct.

It may be more elegant to use
   speech_nrg = silk_LSHIFT( speech_nrg, 16 );
instead of
   speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 16 );
as the saturation will never be used (because input speech_nrg is < 16384).

best,
koen.



On Fri, Feb 16, 2018 at 3:42 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:

> Hi Chandrakala, Logan,
>
> Can you confirm that the attached patch fixes the overflow problem?
>
> Koen, can you confirm the fix makes sense?
>
> Cheers,
>
>         Jean-Marc
>
> On 11/27/2017 12:10 PM, Logan Stromberg wrote:
> > Sorry, long holiday weekend in America.
> > I can say with pretty high certainty that there is an overflow occurring
> > and it is flipping smooth_coef_Q16 to be negative when it probably
> > shouldn't be. I had originally thought it was only an issue where it was
> > overflowing the 15th bit but not the 16th, which might still preserve
> > the intended value for operations that ignore the sign bit (in cases
> > where the 16-bit value is stored in a 32-bit int without sign-extension
> > e.g. 0x00008b3b), but that doesn't seem to be the case here.
> > The problem is even worse when opus_int is defined as 16-bit in the
> > platform - SA_Q15 overflows to negative right here with a similar effect
> > speech_nrg = silk_SQRT_APPROX(speech_nrg);
> > SA_Q15 = silk_SMULWB(32768 + speech_nrg, SA_Q15);
> >
> > I can't speak for the logic where the speech energy gets doubled. It
> > obviously seems intentional but I don't know why. Maybe so that
> > smoothing is performed at a constant rate regardless of whether
> > framesize is 10 or 20ms?
> >
> > On Sun, Nov 26, 2017 at 8:07 PM, Chandrakala Madhira
> > <chandrakala.madhira at soctronics.com
> > <mailto:chandrakala.madhira at soctronics.com>> wrote:
> >
> >     Hi,
> >
> >     Can anyone let me know if this is a bug?
> >
> >     Thank you,
> >     Chandrakala
> >
> >     ------------------------------------------------------------
> ------------
> >     *From: *"Logan Stromberg" <loganstromberg at gmail.com
> >     <mailto:loganstromberg at gmail.com>>
> >     *To: *"Chandrakala Madhira" <chandrakala.madhira at soctronics.com
> >     <mailto:chandrakala.madhira at soctronics.com>>
> >     *Cc: *opus at xiph.org <mailto:opus at xiph.org>
> >     *Sent: *Wednesday, November 22, 2017 12:12:39 PM
> >
> >     *Subject: *Re: [opus] Reg an issue with smoothing factor in VAD
> >     implementation
> >
> >     Yes, yes, I can reproduce it now, but only on platforms that define
> >     a 16-bit int by default (SA_Q15 is an opus_int rather than
> >     opus_int32). What system are you compiling this for?
> >
> >     On Tue, Nov 21, 2017 at 8:34 PM, Chandrakala Madhira
> >     <chandrakala.madhira at soctronics.com
> >     <mailto:chandrakala.madhira at soctronics.com>> wrote:
> >
> >         Hi Logan,
> >
> >         Please find attached the input stream we are using testing.
> >
> >         Thank you,
> >         Chandrakala
> >
> >         ------------------------------------------------------------
> ------------
> >         *From: *"Chandrakala Madhira"
> >         <chandrakala.madhira at soctronics.com
> >         <mailto:chandrakala.madhira at soctronics.com>>
> >         *To: *"Logan Stromberg" <loganstromberg at gmail.com
> >         <mailto:loganstromberg at gmail.com>>
> >         *Cc: *opus at xiph.org <mailto:opus at xiph.org>
> >         *Sent: *Wednesday, November 22, 2017 9:53:55 AM
> >
> >         *Subject: *Re: [opus] Reg an issue with smoothing factor in VAD
> >         implementation
> >
> >         Hi Logan,
> >
> >         Below is the configuration I am using. The remaining parameters
> >         have defaults. The stream is a 16KHz stream. The bitrate
> >         selected is 16KHz. As the thread have size limitations, I will
> >         try to attach the stream in next mail. At 125th frame of this
> >         stream, the probability (SA_Q15) value is 0x0000859c. This value
> >         is treated as a negative number in step-3.
> >
> >         -e voip 16000 1 16000  -framesize 10  -cvbr Input/dg105_16k.wav
> >         Output/dg105_16k_16000Fs_mono_16000bps_10ms_vbr_voip.bit
> >
> >         Thank you,
> >         Chandrakala
> >
> >         Thank you,
> >         Chandrakala
> >
> >         ------------------------------------------------------------
> ------------
> >         *From: *"Logan Stromberg" <loganstromberg at gmail.com
> >         <mailto:loganstromberg at gmail.com>>
> >         *To: *opus at xiph.org <mailto:opus at xiph.org>
> >         *Sent: *Tuesday, November 21, 2017 2:38:16 AM
> >         *Subject: *Re: [opus] Reg an issue with smoothing factor in VAD
> >         implementation
> >
> >         Just for fun, I tried to reproduce such an overflow. I turned on
> >         all debug macros, assertions, and checked arithmetic and then
> >         encoded 2 hours of mixed speech/audio with these parameters:
> >
> >         Sample rate = 48000
> >         Channels = 1
> >         Application = OPUS_APPLICATION_AUDIO
> >         Bitrate = 24 KB/s
> >         Force Mode = MODE_SILK_ONLY
> >         Signal Type = OPUS_SIGNAL_AUTO
> >         Complexity = 10
> >         Frame size = 480 samples (10ms)
> >
> >         No errors came up in encoding. Chandrakala, are these the
> >         encoding parameters that you believe should trigger the error?
> >
> >         - Logan
> >
> >             Hi,
> >
> >             We are looking at the VAD implementation used in opus. We
> >             are looking at the code where speech probability is
> >             calculated based on which SNR is estimated. Below is the
> >             part of the code I am talking about.
> >
> >             /*********************************/
> >             /* Speech Probability Estimation */
> >             /*********************************/
> >             SA_Q15 = silk_sigm_Q15( silk_SMULWB( VAD_SNR_FACTOR_Q16,
> >             pSNR_dB_Q7 ) - VAD_NEGATIVE_OFFSET_Q5 ); // step1: Calculate
> >             speech probability : comment by me
> >
> >             /* Power scaling */
> >             if( speech_nrg <= 0 ) { // step2: update speech probability
> >             based on speech energy : comment by me
> >             SA_Q15 = silk_RSHIFT( SA_Q15, 1 );
> >             } else if( speech_nrg < 32768 ) {
> >             if( psEncC->frame_length == 10 * psEncC->fs_kHz ) {
> >             speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 16 ); // Energy
> >             is doubled here : comment by me
> >             } else {
> >             speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 15 );
> >             }
> >
> >             /* square-root */
> >             speech_nrg = silk_SQRT_APPROX( speech_nrg );
> >             SA_Q15 = silk_SMULWB( 32768 + speech_nrg, SA_Q15 );
> >             }
> >
> >             /* Smoothing coefficient */
> >             smooth_coef_Q16 = silk_SMULWB( VAD_SNR_SMOOTH_COEF_Q18,
> >             silk_SMULWB( (opus_int32)SA_Q15, SA_Q15 ) ); // step3:
> >             Update the smoothing factor based on speech probability :
> >             comment by me
> >
> >             if( psEncC->frame_length == 10 * psEncC->fs_kHz ) {
> >             smooth_coef_Q16 >>= 1;
> >             }
> >
> >             Here, in step1, Speech probability is calculated whose value
> >             is expected to be within [0, 1) in Q15 format. Then based on
> >             the speech energy levels, in Step2, the probability is
> >             updated whose value shall also lie between [0, 1). Later in
> >             Step3, the smooth coeff is calculated. This code do not have
> >             any issue when the frame size is more than or equal to
> >             20msec. But, if the frame size is 10ms, then in step2, the
> >             energy is doubled (this may be done because the original
> >             Silk code is for 20ms. To convert the energy for 20ms, it
> >             could have been doubled). When this is done the probability
> >             which is updated in step2 becomes more than 1. When this is
> >             used in multiplication in Step3, the value is treated as a
> >             negative number because its a 32x16 multiplication. This is
> >             will result in a negative smooth coefficient. Please let me
> >             know if this is a bug.
> >
> >
> >             Thank you,
> >             Chandrakala
> >
> >
> >         _______________________________________________
> >         opus mailing list
> >         opus at xiph.org <mailto:opus at xiph.org>
> >         http://lists.xiph.org/mailman/listinfo/opus
> >         <http://lists.xiph.org/mailman/listinfo/opus>
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > opus mailing list
> > opus at xiph.org
> > http://lists.xiph.org/mailman/listinfo/opus
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20180217/14cce879/attachment.html>


More information about the opus mailing list