[Speex-dev] VAD Questions

Fri Jun 8 10:01:54 PDT 2007

Hello Jean-Marc:

On 08/06/07, Jean-Marc Valin <jean-marc.valin at usherbrooke.ca> wrote:
> > Either one. The question is: If we treat the software like a black
> > box, and we feed in PCM audio, we get Speex encoded data out. Where is
> > the information that indicates whether the encoded data contains
> > speech or not? The API has a "get VAD status", but it seems like that
> > might only indicate whether VAD is currently enabled. Perhaps the VAD
> > status is contained somewhere in the data frames?
>
> Look at the return value of either speex_encode() or speex_preprocess_run().

OK. Thanks.

>
> > Okay. What I was trying to determine was whether or not the speech
> > detection was done with something more sophisticated than frame
> > energy. As you said above, I'll have to look at the sources. For many
> > systems, sonorant energy rate detection is used to detect voice, even
> > under very poor SNR conditions.
>
> I *do* use more than the frame energy. I use the pitch and (IIRC) one of
> two other things. However, it's still *very* hard to do any sort of good
> detection based only on 20 ms. Give me 1 second of latency and it would
> be *much* easier -- though completely useless.

While I can agree with this if you are dealing with real-time, full
duplex links, for my application (non-real-time, half-duplex), the
latency has no effect at all. Do you know of anyone else who has
implemented some post-processing software to provide more "exotic"
speech detection, even at the expense of increased latency?

Cheers,
-- 
Larry Gadallah, VE6VQ/W7                          lgadallah AT gmail DOT com
PGP Sig: 616D 4E52 CF1F 3FEC FFFB  F11B 7DB9 C79A EA7E B25B