[Speex-dev] Speech detection in preprocessor with echo

Jean-Marc Valin Jean-Marc.Valin at USherbrooke.ca
Wed Jun 22 16:13:48 PDT 2005


The main advantage I see in freezing st->loudness2 is that if you
unfreeze it, then the transition will be gradual, whereas if you
unfreeze agc_gain, then it will jump to the new value directly. I have
no idea what the freezing will do to the VAD though.

	Jean-Marc

Le mercredi 22 juin 2005 à 10:46 -0400, Tom Grandgent a écrit :
> agc_gain seemed to fit with the idea of what I wanted to do, it was 
> easy to understand its units and behavior, and freezing it produced 
> the desired results.  Also I wanted to cap it, so that's done at the 
> same place, and that definitely works.
> 
> All I want to do is be able to freeze AGC adaptation and put an 
> upper bound on the AGC (for example, 2x amplification).  Both of 
> these things seem necessary in a real-world app because:
> 
> 1) AGC gain should not increase when speech is not detected.  If it 
> does, then it will inevitably rise during periods of inactivity on 
> the part of the speaker, and then background sounds will be end up 
> being amplified too much and detected as speech.  This is a problem 
> regardless of echo.
> 
> 2) The upper bound is necessary in some situations when VAD is not 
> sufficient to distinguish between desired and undesired sounds.  
> For example, consider a person using a headset and communicating 
> infrequently while constantly using a nearby and noisy peripheral 
> such as a force-feedback steering wheel.  Noises from the wheel are 
> going to get picked up and detected as speech, but they usually 
> won't be as loud as speech.  By capping the AGC at the right level, 
> it's possible to prevent the AGC from amplifying the wheel noises 
> too much while still allowing it to do its job for the speech.
> 
> I see now that st->loudness2 is also used in the VAD.  Maybe this 
> explains some problems I was having. :)  I'll have to give the 
> preprocessor's VAD another try now that I'm aware of this.
> 
> So, do you think it's better to use st->loudness2 for both freezing 
> and capping the AGC?
> 
> Tom
> 
> Jean-Marc Valin <Jean-Marc.Valin at USherbrooke.ca> wrote:
> > 
> > Just curious, why are you freezing agc_gain instead of freezing
> > st->loudness2 ?
> > 
> > Jean-Marc
> > 
> > 
> > Le lundi 20 juin 2005 à 14:40 -0400, Tom Grandgent a écrit : 
> > > I think you'll have to modify Speex to get the functionality you're 
> > > looking for.  I've made a few simple modifications to the AGC to prevent 
> > > it from 1) exceeding a specified level of amplification and 2) enable 
> > > and disable adaptation, so I can freeze it at a certain level while 
> > > speech is not detected.  It's mostly just a matter of doing this at the 
> > > end of speex_compute_agc():
> > > 
> > >    if (!st->agc_frozen)
> > >    {
> > > 	   agc_gain = st->agc_level/st->loudness2;
> > > 	   /*fprintf (stderr, "%f %f %f %f\n", active_bands, st->loudness, st->loudness2, agc_gain);*/
> > > 	   if (agc_gain>st->agc_max_gain)	/* was 200 */
> > > 		   agc_gain = st->agc_max_gain;	/* was 200*/
> > >    }
> > >    else
> > > 	   agc_gain = st->agc_gain;
> > >    st->agc_gain = agc_gain;
> > > 
> > > and adding a few items to speex_preprocess_ctl() and the state struct.  
> > > (I control these things at the application level.. you may wish to 
> > > control them from within the preprocessor if you're using the 
> > > preprocessor's VAD.)
> > > 
> > > Anyway, if you can figure out what's going on with the variables you 
> > > named, I'm sure you can make the necessary modifications to do what 
> > > you've asked for.  I think the preprocessor in general needs a little 
> > > tweaking like this to work well in various real-world situations, but 
> > > I'm not sure how much of this Jean-Marc wants to incorporate into 
> > > Speex vs. leave to application developers.
> > > 
> > > Tom
> > > 
> > > Thorvald Natvig <speex at natvig.com> wrote:
> > > > 
> > > > 
> > > > Echo cancellation works like a charm, but it seems to confuse the 
> > > > preprocessor a bit.
> > > > 
> > > > If listening to background music (properly fed through the echo 
> > > > cancellator), the music is removed but the result is still detected as 
> > > > speech even if almost silence remains in the signal.
> > > > 
> > > > Also, the AGC keeps adjusting to the minute remains in the signal, meaning 
> > > > that sooner or later it will amplify the remains enough that it's clearly 
> > > > audible on the other side. If I cough or say a word, the AGC readjusts and 
> > > > all is fine.
> > > > 
> > > > Looking at the members of the speex_preprocess structure, I see that 
> > > > during these long periods of "silence" (only the background music or 
> > > > only the other end talking while I shut up):
> > > > 
> > > > - Zlast (which looks like a SNR variable) is at 0.05-0.2, but jumps up
> > > >    above 1.0 if I actually say something.
> > > > - loudness2 keeps decreasing from the "normal" of ~6000 to 1000 or so, at
> > > >    which point the residual echo is amplified enough that it's clearly
> > > >    audible at the other end. If I say something, it adjusts.
> > > > - speech_prob is at 0.999 or 1.000 as long as the other end talks.
> > > > 
> > > > This is all with up-to-date SVN version of speex, and in a fairly noisy 
> > > > environment (it's hot, so I have the window open, so passing cars on the 
> > > > nearby road are quite audible, as is my air cleaner).
> > > > 
> > > > Is there something I can do to tune this away, a way to tell the AGC to 
> > > > never go that low, and a way to tell the speech detector that echo remains 
> > > > are not speech?
> > > > 
> > > > _______________________________________________
> > > > Speex-dev mailing list
> > > > Speex-dev at xiph.org
> > > > http://lists.xiph.org/mailman/listinfo/speex-dev
> > > 
> > > _______________________________________________
> > > Speex-dev mailing list
> > > Speex-dev at xiph.org
> > > http://lists.xiph.org/mailman/listinfo/speex-dev
> > 
> > 
> > 
> 




More information about the Speex-dev mailing list