[Speex-dev] VAD Questions

Fri Jun 8 08:47:27 PDT 2007

> Either one. The question is: If we treat the software like a black
> box, and we feed in PCM audio, we get Speex encoded data out. Where is
> the information that indicates whether the encoded data contains
> speech or not? The API has a "get VAD status", but it seems like that
> might only indicate whether VAD is currently enabled. Perhaps the VAD
> status is contained somewhere in the data frames?

Look at the return value of either speex_encode() or speex_preprocess_run().

> Okay. What I was trying to determine was whether or not the speech
> detection was done with something more sophisticated than frame
> energy. As you said above, I'll have to look at the sources. For many
> systems, sonorant energy rate detection is used to detect voice, even
> under very poor SNR conditions.

I *do* use more than the frame energy. I use the pitch and (IIRC) one of
two other things. However, it's still *very* hard to do any sort of good
detection based only on 20 ms. Give me 1 second of latency and it would
be *much* easier -- though completely useless.

	Jean-Marc