[Speex-dev] Voice Activation Level (speex 1.1.11.1)

Tom Grandgent tgrand at canvaslink.com
Thu Mar 2 07:56:34 PST 2006


Lis,

I suggest you try tweaking Speex's VAD probabilities as Steve suggested.  
But consider a simple threshold-based approach as a backup option.  
Personally, I struggled with Speex's VAD algorithms (both encoder and 
preprocessor) for a long time, tweaked the probabilities, wrote special 
case code to work around the mistakes, and was still never satisfied 
with the results.  In times of really obvious silence, it would detect 
speech.  Often, it would detect many brief background noises as speech, 
such as clicks or typing.  Sometimes at the beginning or end of speech, 
it would detect silence.  (It seemed to vary based on the frequency 
content of the speech.)  And, there were issues with VAD and AGC 
together.

I finally switched to a very simple power threshold check and all of 
my problems went away.  It worked far better than I ever expected.  
Background noise is not a problem if you just use the Speex denoiser 
(which is VERY effective) and calculate the power of the signal after 
that.  This is the function I use to calculate the power:

// Returns the power of a signal (sample_t is signed 16-bit int)
float getPower(sample_t *signal, int numSamples)
{
	float powerSum = 0.0f;
	for (int i = 0; i < numSamples; i++)
	{
		float amp = (float) abs(signal[i]);
		powerSum += amp * amp;
	}
	return powerSum / (32768.0f * 32768.0f * (float) numSamples);
}

I can't say that this is optimal or even correct, but it works very 
well for me.  And users rarely have to adjust the threshold as long 
as they're using AGC to bring the signal up to a proper range.

I don't mean to bash Speex VAD.  I really wanted it to work.  But 
for me, in a wideband PC-based VoIP application that relies heavily 
on VAD, a simple power threshold based approach ended up working 
much more reliably.

(By the way, I'm curious about power vs. energy here.  Doesn't it 
make more sense to use power instead of energy for VAD?  Or, maybe, 
the terms are sometimes used interchangably?)

Tom

Steve Kann <stevek at stevek.com> wrote:
> 
> 
> Lis,
> 
>     The Voice Activity Detection (VAD) algorithm in the speex 
> preprocessor does not work simply by detecting the energy level (volume 
> or loudness) in the audio frames, but it uses a more complex algorithm 
> which (a) tries to ignore background noise, and (b) tries to detect 
> speech, in particular, and not just energy.
> 
>     If you need to adjust the sensitivity of this, you can use these 
> settings:
> 
> #define SPEEX_PREPROCESS_SET_PROB_START 14
> #define SPEEX_PREPROCESS_GET_PROB_START 15
> 
> #define SPEEX_PREPROCESS_SET_PROB_CONTINUE 16
> #define SPEEX_PREPROCESS_GET_PROB_CONTINUE 17
> 
> which adjusts the 'probabilities' that are used to define speech and 
> non-speech, for the start of speech, and to continue speech.
> 
> -SteveK
> 
> 
> Lis wrote:
> 
> > Sorry.
> >
> > I forgotten the words volume or loudness.
> > But it is know as microphone stroke too, i think.
> > If something can tell me something about that
> > procedure it would complete my pleasure.
> > To bring back memories,
> > i only wanted to know wheather i can change a
> > variable that holds the sound intensity (loudness)
> > needet to start "encoding >> sending" if the speex codec
> > is in voice activation mode.
> > If that isnt implementet yet it would enjoy me
> > to get information about the preprocess->loudness2
> > for example, or a function (if the lib contains one) that returns a 
> > value whitch equals to the overall
> > loudness of a frame.
> >
> > So i can do some simple interactions with users
> > whitch doesnt want to yell in their microphone
> > for talking something.
> > Other ones got headsets that record their breathing
> > and anyone can listen to.
> > This is not funny the whole day...
> >
> > Greets Lis
> >
> > ----- Original Message ----- From: "Jean-Marc Valin" 
> > <jean-marc.valin at usherbrooke.ca>
> > To: "Lis" <lis at 1234567890qwertzuiopasdfghjklyxcvbnm.de>
> > Cc: <speex-dev at xiph.org>
> > Sent: Thursday, March 02, 2006 2:35 AM
> > Subject: Re: [Speex-dev] Voice Activation Level (speex 1.1.11.1)
> >
> >
> >> Please define what you mean by "voice activation level".
> >>
> >> Jean-Marc
> >>
> >> On Thu, 2006-03-02 at 02:22 +0100, Lis wrote:
> >>
> >>> I havent had found anything in the documentation about voice
> >>> activation levels.
> >>> Does i can change a variable to change the accuracy for activations?
> >>>  
> >>> If not does the speex lib already implement a function for read out
> >>> the
> >>> sound level of a frame?
> >>>  
> >>> Thanks for the advance.
> >>>  
> >>> Lis (Louis Hoefler)
> >>> _______________________________________________
> >>> Speex-dev mailing list
> >>> Speex-dev at xiph.org
> >>> http://lists.xiph.org/mailman/listinfo/speex-dev
> >>
> >>
> > _______________________________________________
> > Speex-dev mailing list
> > Speex-dev at xiph.org
> > http://lists.xiph.org/mailman/listinfo/speex-dev
> >
> 
> _______________________________________________
> Speex-dev mailing list
> Speex-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev



More information about the Speex-dev mailing list