[opus] Reg an issue with smoothing factor in VAD implementation
Koen Vos
koenvos74 at gmail.com
Sat Feb 17 07:34:37 UTC 2018
Thanks Jean-Marc,
Yes this makes sense: the scaling is preserved for both framelengths, and
the maximum value for speech_nrg going into the square-root is now correct.
It may be more elegant to use
speech_nrg = silk_LSHIFT( speech_nrg, 16 );
instead of
speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 16 );
as the saturation will never be used (because input speech_nrg is < 16384).
best,
koen.
On Fri, Feb 16, 2018 at 3:42 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> Hi Chandrakala, Logan,
>
> Can you confirm that the attached patch fixes the overflow problem?
>
> Koen, can you confirm the fix makes sense?
>
> Cheers,
>
> Jean-Marc
>
> On 11/27/2017 12:10 PM, Logan Stromberg wrote:
> > Sorry, long holiday weekend in America.
> > I can say with pretty high certainty that there is an overflow occurring
> > and it is flipping smooth_coef_Q16 to be negative when it probably
> > shouldn't be. I had originally thought it was only an issue where it was
> > overflowing the 15th bit but not the 16th, which might still preserve
> > the intended value for operations that ignore the sign bit (in cases
> > where the 16-bit value is stored in a 32-bit int without sign-extension
> > e.g. 0x00008b3b), but that doesn't seem to be the case here.
> > The problem is even worse when opus_int is defined as 16-bit in the
> > platform - SA_Q15 overflows to negative right here with a similar effect
> > speech_nrg = silk_SQRT_APPROX(speech_nrg);
> > SA_Q15 = silk_SMULWB(32768 + speech_nrg, SA_Q15);
> >
> > I can't speak for the logic where the speech energy gets doubled. It
> > obviously seems intentional but I don't know why. Maybe so that
> > smoothing is performed at a constant rate regardless of whether
> > framesize is 10 or 20ms?
> >
> > On Sun, Nov 26, 2017 at 8:07 PM, Chandrakala Madhira
> > <chandrakala.madhira at soctronics.com
> > <mailto:chandrakala.madhira at soctronics.com>> wrote:
> >
> > Hi,
> >
> > Can anyone let me know if this is a bug?
> >
> > Thank you,
> > Chandrakala
> >
> > ------------------------------------------------------------
> ------------
> > *From: *"Logan Stromberg" <loganstromberg at gmail.com
> > <mailto:loganstromberg at gmail.com>>
> > *To: *"Chandrakala Madhira" <chandrakala.madhira at soctronics.com
> > <mailto:chandrakala.madhira at soctronics.com>>
> > *Cc: *opus at xiph.org <mailto:opus at xiph.org>
> > *Sent: *Wednesday, November 22, 2017 12:12:39 PM
> >
> > *Subject: *Re: [opus] Reg an issue with smoothing factor in VAD
> > implementation
> >
> > Yes, yes, I can reproduce it now, but only on platforms that define
> > a 16-bit int by default (SA_Q15 is an opus_int rather than
> > opus_int32). What system are you compiling this for?
> >
> > On Tue, Nov 21, 2017 at 8:34 PM, Chandrakala Madhira
> > <chandrakala.madhira at soctronics.com
> > <mailto:chandrakala.madhira at soctronics.com>> wrote:
> >
> > Hi Logan,
> >
> > Please find attached the input stream we are using testing.
> >
> > Thank you,
> > Chandrakala
> >
> > ------------------------------------------------------------
> ------------
> > *From: *"Chandrakala Madhira"
> > <chandrakala.madhira at soctronics.com
> > <mailto:chandrakala.madhira at soctronics.com>>
> > *To: *"Logan Stromberg" <loganstromberg at gmail.com
> > <mailto:loganstromberg at gmail.com>>
> > *Cc: *opus at xiph.org <mailto:opus at xiph.org>
> > *Sent: *Wednesday, November 22, 2017 9:53:55 AM
> >
> > *Subject: *Re: [opus] Reg an issue with smoothing factor in VAD
> > implementation
> >
> > Hi Logan,
> >
> > Below is the configuration I am using. The remaining parameters
> > have defaults. The stream is a 16KHz stream. The bitrate
> > selected is 16KHz. As the thread have size limitations, I will
> > try to attach the stream in next mail. At 125th frame of this
> > stream, the probability (SA_Q15) value is 0x0000859c. This value
> > is treated as a negative number in step-3.
> >
> > -e voip 16000 1 16000 -framesize 10 -cvbr Input/dg105_16k.wav
> > Output/dg105_16k_16000Fs_mono_16000bps_10ms_vbr_voip.bit
> >
> > Thank you,
> > Chandrakala
> >
> > Thank you,
> > Chandrakala
> >
> > ------------------------------------------------------------
> ------------
> > *From: *"Logan Stromberg" <loganstromberg at gmail.com
> > <mailto:loganstromberg at gmail.com>>
> > *To: *opus at xiph.org <mailto:opus at xiph.org>
> > *Sent: *Tuesday, November 21, 2017 2:38:16 AM
> > *Subject: *Re: [opus] Reg an issue with smoothing factor in VAD
> > implementation
> >
> > Just for fun, I tried to reproduce such an overflow. I turned on
> > all debug macros, assertions, and checked arithmetic and then
> > encoded 2 hours of mixed speech/audio with these parameters:
> >
> > Sample rate = 48000
> > Channels = 1
> > Application = OPUS_APPLICATION_AUDIO
> > Bitrate = 24 KB/s
> > Force Mode = MODE_SILK_ONLY
> > Signal Type = OPUS_SIGNAL_AUTO
> > Complexity = 10
> > Frame size = 480 samples (10ms)
> >
> > No errors came up in encoding. Chandrakala, are these the
> > encoding parameters that you believe should trigger the error?
> >
> > - Logan
> >
> > Hi,
> >
> > We are looking at the VAD implementation used in opus. We
> > are looking at the code where speech probability is
> > calculated based on which SNR is estimated. Below is the
> > part of the code I am talking about.
> >
> > /*********************************/
> > /* Speech Probability Estimation */
> > /*********************************/
> > SA_Q15 = silk_sigm_Q15( silk_SMULWB( VAD_SNR_FACTOR_Q16,
> > pSNR_dB_Q7 ) - VAD_NEGATIVE_OFFSET_Q5 ); // step1: Calculate
> > speech probability : comment by me
> >
> > /* Power scaling */
> > if( speech_nrg <= 0 ) { // step2: update speech probability
> > based on speech energy : comment by me
> > SA_Q15 = silk_RSHIFT( SA_Q15, 1 );
> > } else if( speech_nrg < 32768 ) {
> > if( psEncC->frame_length == 10 * psEncC->fs_kHz ) {
> > speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 16 ); // Energy
> > is doubled here : comment by me
> > } else {
> > speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 15 );
> > }
> >
> > /* square-root */
> > speech_nrg = silk_SQRT_APPROX( speech_nrg );
> > SA_Q15 = silk_SMULWB( 32768 + speech_nrg, SA_Q15 );
> > }
> >
> > /* Smoothing coefficient */
> > smooth_coef_Q16 = silk_SMULWB( VAD_SNR_SMOOTH_COEF_Q18,
> > silk_SMULWB( (opus_int32)SA_Q15, SA_Q15 ) ); // step3:
> > Update the smoothing factor based on speech probability :
> > comment by me
> >
> > if( psEncC->frame_length == 10 * psEncC->fs_kHz ) {
> > smooth_coef_Q16 >>= 1;
> > }
> >
> > Here, in step1, Speech probability is calculated whose value
> > is expected to be within [0, 1) in Q15 format. Then based on
> > the speech energy levels, in Step2, the probability is
> > updated whose value shall also lie between [0, 1). Later in
> > Step3, the smooth coeff is calculated. This code do not have
> > any issue when the frame size is more than or equal to
> > 20msec. But, if the frame size is 10ms, then in step2, the
> > energy is doubled (this may be done because the original
> > Silk code is for 20ms. To convert the energy for 20ms, it
> > could have been doubled). When this is done the probability
> > which is updated in step2 becomes more than 1. When this is
> > used in multiplication in Step3, the value is treated as a
> > negative number because its a 32x16 multiplication. This is
> > will result in a negative smooth coefficient. Please let me
> > know if this is a bug.
> >
> >
> > Thank you,
> > Chandrakala
> >
> >
> > _______________________________________________
> > opus mailing list
> > opus at xiph.org <mailto:opus at xiph.org>
> > http://lists.xiph.org/mailman/listinfo/opus
> > <http://lists.xiph.org/mailman/listinfo/opus>
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > opus mailing list
> > opus at xiph.org
> > http://lists.xiph.org/mailman/listinfo/opus
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20180217/14cce879/attachment.html>
More information about the opus
mailing list