[Speex-dev] VAD with speex_preprocess()

Tue Mar 8 08:42:53 PST 2005

As I understand it, there are two separate ways to get VAD information 
from Speex: 1) Using the encoder.  2) Using speex_preprocess().  I 
present the following observations from an application developer's 
perspective.  They may be wrong, in which case I would appreciate 
corrections.

- The two VAD systems are implemented differently.

- speex_preprocess()'s VAD provides more accurate detection than the 
encoder's VAD at the cost of more CPU usage.

- speex_preprocess()'s VAD is affected by the AGC and/or denoise state 
more directly than the encoder's VAD.

- Possibly as a result of the previous point, speex_preprocess()'s VAD 
can get into a bad state, given an input that varies drastically in 
amplitude/behavior, after which point its accuracy is ruined and the 
only solution is to destroy/recreate the preprocess state.

Tom

"Paul Gryting" <paul.gryting at teligy.com> wrote:
> 
> In speexenc.c, speex_preprocess() is not called unless AGC or denoise is
> enabled.
> If only VAD is enabled, it does not get called.
> 
> speex_preprocess() has vad_enabled specific code to detect voice activity.
> speex_preprocess()
> {
>    ...
>    ...
>    if (st->vad_enabled)
>       is_speech = speex_compute_vad(st, ps, mean_prior, mean_post);
> 
>     ...
>     ...
>     return is_speech;
> }
> 
> Some questions for the knowledgable:
> Is speex_preprocess() needed to use vad?
> 
> Can speex_preprocess() be used to detect silent frames if vad is enabled,
> but not agc or denoise?
> What internally does speex do differntly for silent frames when VAD is
> enabled?
> 
> 
> Paul