[Speex-dev] Speech detection in preprocessor with echo
Jean-Marc Valin
Jean-Marc.Valin at USherbrooke.ca
Wed Jun 22 16:13:48 PDT 2005
The main advantage I see in freezing st->loudness2 is that if you
unfreeze it, then the transition will be gradual, whereas if you
unfreeze agc_gain, then it will jump to the new value directly. I have
no idea what the freezing will do to the VAD though.
Jean-Marc
Le mercredi 22 juin 2005 à 10:46 -0400, Tom Grandgent a écrit :
> agc_gain seemed to fit with the idea of what I wanted to do, it was
> easy to understand its units and behavior, and freezing it produced
> the desired results. Also I wanted to cap it, so that's done at the
> same place, and that definitely works.
>
> All I want to do is be able to freeze AGC adaptation and put an
> upper bound on the AGC (for example, 2x amplification). Both of
> these things seem necessary in a real-world app because:
>
> 1) AGC gain should not increase when speech is not detected. If it
> does, then it will inevitably rise during periods of inactivity on
> the part of the speaker, and then background sounds will be end up
> being amplified too much and detected as speech. This is a problem
> regardless of echo.
>
> 2) The upper bound is necessary in some situations when VAD is not
> sufficient to distinguish between desired and undesired sounds.
> For example, consider a person using a headset and communicating
> infrequently while constantly using a nearby and noisy peripheral
> such as a force-feedback steering wheel. Noises from the wheel are
> going to get picked up and detected as speech, but they usually
> won't be as loud as speech. By capping the AGC at the right level,
> it's possible to prevent the AGC from amplifying the wheel noises
> too much while still allowing it to do its job for the speech.
>
> I see now that st->loudness2 is also used in the VAD. Maybe this
> explains some problems I was having. :) I'll have to give the
> preprocessor's VAD another try now that I'm aware of this.
>
> So, do you think it's better to use st->loudness2 for both freezing
> and capping the AGC?
>
> Tom
>
> Jean-Marc Valin <Jean-Marc.Valin at USherbrooke.ca> wrote:
> >
> > Just curious, why are you freezing agc_gain instead of freezing
> > st->loudness2 ?
> >
> > Jean-Marc
> >
> >
> > Le lundi 20 juin 2005 à 14:40 -0400, Tom Grandgent a écrit :
> > > I think you'll have to modify Speex to get the functionality you're
> > > looking for. I've made a few simple modifications to the AGC to prevent
> > > it from 1) exceeding a specified level of amplification and 2) enable
> > > and disable adaptation, so I can freeze it at a certain level while
> > > speech is not detected. It's mostly just a matter of doing this at the
> > > end of speex_compute_agc():
> > >
> > > if (!st->agc_frozen)
> > > {
> > > agc_gain = st->agc_level/st->loudness2;
> > > /*fprintf (stderr, "%f %f %f %f\n", active_bands, st->loudness, st->loudness2, agc_gain);*/
> > > if (agc_gain>st->agc_max_gain) /* was 200 */
> > > agc_gain = st->agc_max_gain; /* was 200*/
> > > }
> > > else
> > > agc_gain = st->agc_gain;
> > > st->agc_gain = agc_gain;
> > >
> > > and adding a few items to speex_preprocess_ctl() and the state struct.
> > > (I control these things at the application level.. you may wish to
> > > control them from within the preprocessor if you're using the
> > > preprocessor's VAD.)
> > >
> > > Anyway, if you can figure out what's going on with the variables you
> > > named, I'm sure you can make the necessary modifications to do what
> > > you've asked for. I think the preprocessor in general needs a little
> > > tweaking like this to work well in various real-world situations, but
> > > I'm not sure how much of this Jean-Marc wants to incorporate into
> > > Speex vs. leave to application developers.
> > >
> > > Tom
> > >
> > > Thorvald Natvig <speex at natvig.com> wrote:
> > > >
> > > >
> > > > Echo cancellation works like a charm, but it seems to confuse the
> > > > preprocessor a bit.
> > > >
> > > > If listening to background music (properly fed through the echo
> > > > cancellator), the music is removed but the result is still detected as
> > > > speech even if almost silence remains in the signal.
> > > >
> > > > Also, the AGC keeps adjusting to the minute remains in the signal, meaning
> > > > that sooner or later it will amplify the remains enough that it's clearly
> > > > audible on the other side. If I cough or say a word, the AGC readjusts and
> > > > all is fine.
> > > >
> > > > Looking at the members of the speex_preprocess structure, I see that
> > > > during these long periods of "silence" (only the background music or
> > > > only the other end talking while I shut up):
> > > >
> > > > - Zlast (which looks like a SNR variable) is at 0.05-0.2, but jumps up
> > > > above 1.0 if I actually say something.
> > > > - loudness2 keeps decreasing from the "normal" of ~6000 to 1000 or so, at
> > > > which point the residual echo is amplified enough that it's clearly
> > > > audible at the other end. If I say something, it adjusts.
> > > > - speech_prob is at 0.999 or 1.000 as long as the other end talks.
> > > >
> > > > This is all with up-to-date SVN version of speex, and in a fairly noisy
> > > > environment (it's hot, so I have the window open, so passing cars on the
> > > > nearby road are quite audible, as is my air cleaner).
> > > >
> > > > Is there something I can do to tune this away, a way to tell the AGC to
> > > > never go that low, and a way to tell the speech detector that echo remains
> > > > are not speech?
> > > >
> > > > _______________________________________________
> > > > Speex-dev mailing list
> > > > Speex-dev at xiph.org
> > > > http://lists.xiph.org/mailman/listinfo/speex-dev
> > >
> > > _______________________________________________
> > > Speex-dev mailing list
> > > Speex-dev at xiph.org
> > > http://lists.xiph.org/mailman/listinfo/speex-dev
> >
> >
> >
>
More information about the Speex-dev
mailing list