[Speex-dev] Voice Activation Level (speex 1.1.11.1)

Thu Mar 2 21:55:52 PST 2006

Hi...Tom,

How to use the code you written?
Can you show me some example?

Thanks,

-----Original Message-----
From: speex-dev-bounces at xiph.org [mailto:speex-dev-bounces at xiph.org] On =
Behalf Of Tom Grandgent
Sent: Friday, March 03, 2006 12:57 AM
To: Steve Kann; Lis
Cc: speex-dev at xiph.org
Subject: Re: [Speex-dev] Voice Activation Level (speex 1.1.11.1)

Lis,

I suggest you try tweaking Speex's VAD probabilities as Steve suggested. =
=20
But consider a simple threshold-based approach as a backup option. =20
Personally, I struggled with Speex's VAD algorithms (both encoder and
preprocessor) for a long time, tweaked the probabilities, wrote special =
case code to work around the mistakes, and was still never satisfied =
with the results.  In times of really obvious silence, it would detect =
speech.  Often, it would detect many brief background noises as speech, =
such as clicks or typing.  Sometimes at the beginning or end of speech, =
it would detect silence.  (It seemed to vary based on the frequency =
content of the speech.)  And, there were issues with VAD and AGC =
together.

I finally switched to a very simple power threshold check and all of my =
problems went away.  It worked far better than I ever expected. =20
Background noise is not a problem if you just use the Speex denoiser =
(which is VERY effective) and calculate the power of the signal after =
that.  This is the function I use to calculate the power:

// Returns the power of a signal (sample_t is signed 16-bit int) float =
getPower(sample_t *signal, int numSamples) {
	float powerSum =3D 0.0f;
	for (int i =3D 0; i < numSamples; i++)
	{
		float amp =3D (float) abs(signal[i]);
		powerSum +=3D amp * amp;
	}
	return powerSum / (32768.0f * 32768.0f * (float) numSamples); }

I can't say that this is optimal or even correct, but it works very well =
for me.  And users rarely have to adjust the threshold as long as =
they're using AGC to bring the signal up to a proper range.

I don't mean to bash Speex VAD.  I really wanted it to work.  But for =
me, in a wideband PC-based VoIP application that relies heavily on VAD, =
a simple power threshold based approach ended up working much more =
reliably.

(By the way, I'm curious about power vs. energy here.  Doesn't it make =
more sense to use power instead of energy for VAD?  Or, maybe, the terms =
are sometimes used interchangably?)

Tom

Steve Kann <stevek at stevek.com> wrote:
>=20
>=20
> Lis,
>=20
>     The Voice Activity Detection (VAD) algorithm in the speex=20
> preprocessor does not work simply by detecting the energy level=20
> (volume or loudness) in the audio frames, but it uses a more complex=20
> algorithm which (a) tries to ignore background noise, and (b) tries to =

> detect speech, in particular, and not just energy.
>=20
>     If you need to adjust the sensitivity of this, you can use these
> settings:
>=20
> #define SPEEX_PREPROCESS_SET_PROB_START 14 #define=20
> SPEEX_PREPROCESS_GET_PROB_START 15
>=20
> #define SPEEX_PREPROCESS_SET_PROB_CONTINUE 16 #define=20
> SPEEX_PREPROCESS_GET_PROB_CONTINUE 17
>=20
> which adjusts the 'probabilities' that are used to define speech and=20
> non-speech, for the start of speech, and to continue speech.
>=20
> -SteveK
>=20
>=20
> Lis wrote:
>=20
> > Sorry.
> >
> > I forgotten the words volume or loudness.
> > But it is know as microphone stroke too, i think.
> > If something can tell me something about that procedure it would=20
> > complete my pleasure.
> > To bring back memories,
> > i only wanted to know wheather i can change a variable that holds=20
> > the sound intensity (loudness) needet to start "encoding >> sending" =

> > if the speex codec is in voice activation mode.
> > If that isnt implementet yet it would enjoy me to get information=20
> > about the preprocess->loudness2 for example, or a function (if the=20
> > lib contains one) that returns a value whitch equals to the overall=20
> > loudness of a frame.
> >
> > So i can do some simple interactions with users whitch doesnt want=20
> > to yell in their microphone for talking something.
> > Other ones got headsets that record their breathing and anyone can=20
> > listen to.
> > This is not funny the whole day...
> >
> > Greets Lis
> >
> > ----- Original Message ----- From: "Jean-Marc Valin"=20
> > <jean-marc.valin at usherbrooke.ca>
> > To: "Lis" <lis at 1234567890qwertzuiopasdfghjklyxcvbnm.de>
> > Cc: <speex-dev at xiph.org>
> > Sent: Thursday, March 02, 2006 2:35 AM
> > Subject: Re: [Speex-dev] Voice Activation Level (speex 1.1.11.1)
> >
> >
> >> Please define what you mean by "voice activation level".
> >>
> >> Jean-Marc
> >>
> >> On Thu, 2006-03-02 at 02:22 +0100, Lis wrote:
> >>
> >>> I havent had found anything in the documentation about voice=20
> >>> activation levels.
> >>> Does i can change a variable to change the accuracy for =
activations?
> >>> =20
> >>> If not does the speex lib already implement a function for read=20
> >>> out the sound level of a frame?
> >>> =20
> >>> Thanks for the advance.
> >>> =20
> >>> Lis (Louis Hoefler)
> >>> _______________________________________________
> >>> Speex-dev mailing list
> >>> Speex-dev at xiph.org
> >>> http://lists.xiph.org/mailman/listinfo/speex-dev
> >>
> >>
> > _______________________________________________
> > Speex-dev mailing list
> > Speex-dev at xiph.org
> > http://lists.xiph.org/mailman/listinfo/speex-dev
> >
>=20
> _______________________________________________
> Speex-dev mailing list
> Speex-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev

_______________________________________________
Speex-dev mailing list
Speex-dev at xiph.org
http://lists.xiph.org/mailman/listinfo/speex-dev