[Speex-dev] Voice activity detection

Jean-Marc Valin jean-marc.valin at usherbrooke.ca
Fri Feb 15 13:09:41 PST 2008


> Anyway, my request is, can you build in a pre and post buffer into the
> VAD?  In mine, if I detect voice any time between now and say a quarter
> second later, I start sending, and then I wait a half second or whatever
> after I stop detecting.  You pretty much have to have this, or people
> start getting anxious talking over an internet stream.  They have to
> enunciate expressions like "ya probably" because the ya isn't detected,
> only the probably.  By sending a bit of padding around the detection, it
> also prevents the detector from dropping out mid-sentence.  It takes it
> from being a screaming contest over a walkie talkie, to a normal
> telephone conversation.
> 
> You might be reluctant to do this, because you have to add in some state
> information instead of just focusing on the current buffer, but the
> quality improvement is enormous.  I'd just like to be able to pass a pre
> and post value to the VAD in milliseconds, defaulting to either 0 or
> values similar to what I quoted above.  And I realize this can add some
> delay, but even detecting a single extra syllable makes a world of
> difference.

If you like to buffer speech, just do it. There's no reason you need to
have the buffer in the VAD itself.

	Jean-Marc


More information about the Speex-dev mailing list