[Speex-dev] Voice activity detection
zmorris at mac.com
zmorris at mac.com
Fri Feb 15 07:49:04 PST 2008
Hey sorry to hijack this thread, but I just remembered a request I
wanted to make to the speex devs. I tried using the activity
detector, but I just couldn't get it working well. I ended up using
my own, where I think it just considered voice on if it passed a
certain threshold (I know, pretty primitive). I also tried one that
checked for a signal, like if the strongest frequency was above a
threshold. I don't remember what function it was, but it was very
simple, not an FFT, but like an autocorrelation or something, but it
didn't work any better than loudness detection. So I would like to
use speex's.
Anyway, my request is, can you build in a pre and post buffer into the
VAD? In mine, if I detect voice any time between now and say a
quarter second later, I start sending, and then I wait a half second
or whatever after I stop detecting. You pretty much have to have
this, or people start getting anxious talking over an internet
stream. They have to enunciate expressions like "ya probably" because
the ya isn't detected, only the probably. By sending a bit of padding
around the detection, it also prevents the detector from dropping out
mid-sentence. It takes it from being a screaming contest over a
walkie talkie, to a normal telephone conversation.
You might be reluctant to do this, because you have to add in some
state information instead of just focusing on the current buffer, but
the quality improvement is enormous. I'd just like to be able to pass
a pre and post value to the VAD in milliseconds, defaulting to either
0 or values similar to what I quoted above. And I realize this can
add some delay, but even detecting a single extra syllable makes a
world of difference.
Well, thanx for your time,
--Zack
More information about the Speex-dev
mailing list