[Speex-dev] Leaking audio and AGC/VAD
tgrand at canvaslink.com
Fri Feb 3 07:55:42 PST 2006
The leakage problem you describe is very, very common and you will need
to do something to address it. I modified the version of Speex I use to
implement an adjustable max gain. If you look at speex_compute_agc in
preprocess.c, you will see:
agc_gain = 200;
This max of 200 is usually more than enough to amplify leakage which
occurs either in the sound hardware or in a headset to the point where
it is as loud as normal speech.
You can replace this max of 200 with a configurable max gain, which is
what I've done. But then you lose the 'zero user configuration'. And
getting people to set a correct max gain can be problematic.
It might be better to instead adjust the adaptation rate (?) of the AGC.
For example, by default, it will happily jump between 0.9x and 50x given
the right input signal. And this is desirable for some situations, such
as people talking in a meeting room where one person is shouting right
next to the mic and another person is speaking normally from across the
table. But it's not desirable when the high amplification is picking up
leakage or background sounds like typing. I don't know how to adjust
the Speex AGC's adaptation rate, or if it's practical to do so.
Zero (or at least minimal) user configuration AGC that covers all common
cases is a difficult but important problem to solve. In my opinion,
Speex AGC is a nice starting point but you have to do some work to get
the results you're looking for.
Of course, AGC is only part of the picture when you're using VAD as well.
But, in my experience, no VAD system can reliably detect desirable sound
when AGC is over-amplifying undesirable sound.
njt at home.se wrote:
> I am working on a VOIP implementation were one of the key design goals
> is zero user configuration. Similar to Skype.
> What I've come to notice is that my soundcard (NForce4 based) leaks
> audio from the playback path to the recording path.
> This is probably not unique for my hardware at all and will happen on
> some percentage of all users of my software.
> (All of this is Win32)
> What happens is that when no one is talking I think Speex pre-processor
> starts to think that the very quiet leak from my playback path is
> actually someone talking.
> I am using AGC and VAD and my guess would be that AGC amplifies the leak
> to such a level that VAD starts triggering.
> This is rather annoying for the user if the software is used while
> playing MP3s (loud) or playing computer games.
> Would it be possible to implement some sort of a threshold to limit what
> Speex classifies as "usable sound" or something?
> I would also like to thank the entire Speex effort for this great piece
> of software and especially Jean-Marc Valin
> //Regards Jonas Tärnström, Sweden.
> Speex-dev mailing list
> Speex-dev at xiph.org
More information about the Speex-dev