[Speex-dev] Voice Activation Level (speex 1.1.11.1)
Tom Grandgent
tgrand at canvaslink.com
Thu Mar 2 07:56:34 PST 2006
Lis,
I suggest you try tweaking Speex's VAD probabilities as Steve suggested.
But consider a simple threshold-based approach as a backup option.
Personally, I struggled with Speex's VAD algorithms (both encoder and
preprocessor) for a long time, tweaked the probabilities, wrote special
case code to work around the mistakes, and was still never satisfied
with the results. In times of really obvious silence, it would detect
speech. Often, it would detect many brief background noises as speech,
such as clicks or typing. Sometimes at the beginning or end of speech,
it would detect silence. (It seemed to vary based on the frequency
content of the speech.) And, there were issues with VAD and AGC
together.
I finally switched to a very simple power threshold check and all of
my problems went away. It worked far better than I ever expected.
Background noise is not a problem if you just use the Speex denoiser
(which is VERY effective) and calculate the power of the signal after
that. This is the function I use to calculate the power:
// Returns the power of a signal (sample_t is signed 16-bit int)
float getPower(sample_t *signal, int numSamples)
{
float powerSum = 0.0f;
for (int i = 0; i < numSamples; i++)
{
float amp = (float) abs(signal[i]);
powerSum += amp * amp;
}
return powerSum / (32768.0f * 32768.0f * (float) numSamples);
}
I can't say that this is optimal or even correct, but it works very
well for me. And users rarely have to adjust the threshold as long
as they're using AGC to bring the signal up to a proper range.
I don't mean to bash Speex VAD. I really wanted it to work. But
for me, in a wideband PC-based VoIP application that relies heavily
on VAD, a simple power threshold based approach ended up working
much more reliably.
(By the way, I'm curious about power vs. energy here. Doesn't it
make more sense to use power instead of energy for VAD? Or, maybe,
the terms are sometimes used interchangably?)
Tom
Steve Kann <stevek at stevek.com> wrote:
>
>
> Lis,
>
> The Voice Activity Detection (VAD) algorithm in the speex
> preprocessor does not work simply by detecting the energy level (volume
> or loudness) in the audio frames, but it uses a more complex algorithm
> which (a) tries to ignore background noise, and (b) tries to detect
> speech, in particular, and not just energy.
>
> If you need to adjust the sensitivity of this, you can use these
> settings:
>
> #define SPEEX_PREPROCESS_SET_PROB_START 14
> #define SPEEX_PREPROCESS_GET_PROB_START 15
>
> #define SPEEX_PREPROCESS_SET_PROB_CONTINUE 16
> #define SPEEX_PREPROCESS_GET_PROB_CONTINUE 17
>
> which adjusts the 'probabilities' that are used to define speech and
> non-speech, for the start of speech, and to continue speech.
>
> -SteveK
>
>
> Lis wrote:
>
> > Sorry.
> >
> > I forgotten the words volume or loudness.
> > But it is know as microphone stroke too, i think.
> > If something can tell me something about that
> > procedure it would complete my pleasure.
> > To bring back memories,
> > i only wanted to know wheather i can change a
> > variable that holds the sound intensity (loudness)
> > needet to start "encoding >> sending" if the speex codec
> > is in voice activation mode.
> > If that isnt implementet yet it would enjoy me
> > to get information about the preprocess->loudness2
> > for example, or a function (if the lib contains one) that returns a
> > value whitch equals to the overall
> > loudness of a frame.
> >
> > So i can do some simple interactions with users
> > whitch doesnt want to yell in their microphone
> > for talking something.
> > Other ones got headsets that record their breathing
> > and anyone can listen to.
> > This is not funny the whole day...
> >
> > Greets Lis
> >
> > ----- Original Message ----- From: "Jean-Marc Valin"
> > <jean-marc.valin at usherbrooke.ca>
> > To: "Lis" <lis at 1234567890qwertzuiopasdfghjklyxcvbnm.de>
> > Cc: <speex-dev at xiph.org>
> > Sent: Thursday, March 02, 2006 2:35 AM
> > Subject: Re: [Speex-dev] Voice Activation Level (speex 1.1.11.1)
> >
> >
> >> Please define what you mean by "voice activation level".
> >>
> >> Jean-Marc
> >>
> >> On Thu, 2006-03-02 at 02:22 +0100, Lis wrote:
> >>
> >>> I havent had found anything in the documentation about voice
> >>> activation levels.
> >>> Does i can change a variable to change the accuracy for activations?
> >>>
> >>> If not does the speex lib already implement a function for read out
> >>> the
> >>> sound level of a frame?
> >>>
> >>> Thanks for the advance.
> >>>
> >>> Lis (Louis Hoefler)
> >>> _______________________________________________
> >>> Speex-dev mailing list
> >>> Speex-dev at xiph.org
> >>> http://lists.xiph.org/mailman/listinfo/speex-dev
> >>
> >>
> > _______________________________________________
> > Speex-dev mailing list
> > Speex-dev at xiph.org
> > http://lists.xiph.org/mailman/listinfo/speex-dev
> >
>
> _______________________________________________
> Speex-dev mailing list
> Speex-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev
More information about the Speex-dev
mailing list