Fw: [Speex-dev] Voice Activation Level (speex

Lis lis at 1234567890qwertzuiopasdfghjklyxcvbnm.de
Fri Mar 3 03:47:28 PST 2006

I implemented the calcPower().
It works perfectly.
The example is given you in just about 6 hours.
Cant paste the whole source here and need to
meet someone now.

Thanks all (particulary tom).
I try to figure out whitch problem exists with the
theese days

----- Original Message ----- 
From: "¼Õ½Â¿ø" <ssw0725 at ncsoft.net>
To: "Tom Grandgent" <tgrand at canvaslink.com>
Cc: <speex-dev at xiph.org>
Sent: Friday, March 03, 2006 6:55 AM
Subject: RE: [Speex-dev] Voice Activation Level (speex


How to use the code you written?
Can you show me some example?


-----Original Message-----
From: speex-dev-bounces at xiph.org [mailto:speex-dev-bounces at xiph.org] On 
Behalf Of Tom Grandgent
Sent: Friday, March 03, 2006 12:57 AM
To: Steve Kann; Lis
Cc: speex-dev at xiph.org
Subject: Re: [Speex-dev] Voice Activation Level (speex


I suggest you try tweaking Speex's VAD probabilities as Steve suggested.
But consider a simple threshold-based approach as a backup option.
Personally, I struggled with Speex's VAD algorithms (both encoder and
preprocessor) for a long time, tweaked the probabilities, wrote special case 
code to work around the mistakes, and was still never satisfied with the 
results.  In times of really obvious silence, it would detect speech. 
Often, it would detect many brief background noises as speech, such as 
clicks or typing.  Sometimes at the beginning or end of speech, it would 
detect silence.  (It seemed to vary based on the frequency content of the 
speech.)  And, there were issues with VAD and AGC together.

I finally switched to a very simple power threshold check and all of my 
problems went away.  It worked far better than I ever expected.
Background noise is not a problem if you just use the Speex denoiser (which 
is VERY effective) and calculate the power of the signal after that.  This 
is the function I use to calculate the power:

// Returns the power of a signal (sample_t is signed 16-bit int) float 
getPower(sample_t *signal, int numSamples) {
float powerSum = 0.0f;
for (int i = 0; i < numSamples; i++)
float amp = (float) abs(signal[i]);
powerSum += amp * amp;
return powerSum / (32768.0f * 32768.0f * (float) numSamples); }

I can't say that this is optimal or even correct, but it works very well for 
me.  And users rarely have to adjust the threshold as long as they're using 
AGC to bring the signal up to a proper range.

I don't mean to bash Speex VAD.  I really wanted it to work.  But for me, in 
a wideband PC-based VoIP application that relies heavily on VAD, a simple 
power threshold based approach ended up working much more reliably.

(By the way, I'm curious about power vs. energy here.  Doesn't it make more 
sense to use power instead of energy for VAD?  Or, maybe, the terms are 
sometimes used interchangably?)


Steve Kann <stevek at stevek.com> wrote:
> Lis,
>     The Voice Activity Detection (VAD) algorithm in the speex
> preprocessor does not work simply by detecting the energy level
> (volume or loudness) in the audio frames, but it uses a more complex
> algorithm which (a) tries to ignore background noise, and (b) tries to
> detect speech, in particular, and not just energy.
>     If you need to adjust the sensitivity of this, you can use these
> settings:
> which adjusts the 'probabilities' that are used to define speech and
> non-speech, for the start of speech, and to continue speech.
> -SteveK
> Lis wrote:
> > Sorry.
> >
> > I forgotten the words volume or loudness.
> > But it is know as microphone stroke too, i think.
> > If something can tell me something about that procedure it would
> > complete my pleasure.
> > To bring back memories,
> > i only wanted to know wheather i can change a variable that holds
> > the sound intensity (loudness) needet to start "encoding >> sending"
> > if the speex codec is in voice activation mode.
> > If that isnt implementet yet it would enjoy me to get information
> > about the preprocess->loudness2 for example, or a function (if the
> > lib contains one) that returns a value whitch equals to the overall
> > loudness of a frame.
> >
> > So i can do some simple interactions with users whitch doesnt want
> > to yell in their microphone for talking something.
> > Other ones got headsets that record their breathing and anyone can
> > listen to.
> > This is not funny the whole day...
> >
> > Greets Lis
> >
> > ----- Original Message ----- From: "Jean-Marc Valin"
> > <jean-marc.valin at usherbrooke.ca>
> > To: "Lis" <lis at 1234567890qwertzuiopasdfghjklyxcvbnm.de>
> > Cc: <speex-dev at xiph.org>
> > Sent: Thursday, March 02, 2006 2:35 AM
> > Subject: Re: [Speex-dev] Voice Activation Level (speex
> >
> >
> >> Please define what you mean by "voice activation level".
> >>
> >> Jean-Marc
> >>
> >> On Thu, 2006-03-02 at 02:22 +0100, Lis wrote:
> >>
> >>> I havent had found anything in the documentation about voice
> >>> activation levels.
> >>> Does i can change a variable to change the accuracy for activations?
> >>>
> >>> If not does the speex lib already implement a function for read
> >>> out the sound level of a frame?
> >>>
> >>> Thanks for the advance.
> >>>
> >>> Lis (Louis Hoefler)
> >>> _______________________________________________
> >>> Speex-dev mailing list
> >>> Speex-dev at xiph.org
> >>> http://lists.xiph.org/mailman/listinfo/speex-dev
> >>
> >>
> > _______________________________________________
> > Speex-dev mailing list
> > Speex-dev at xiph.org
> > http://lists.xiph.org/mailman/listinfo/speex-dev
> >
> _______________________________________________
> Speex-dev mailing list
> Speex-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev

Speex-dev mailing list
Speex-dev at xiph.org

Speex-dev mailing list
Speex-dev at xiph.org

More information about the Speex-dev mailing list