[speex-dev] Memory leak in denoiser + a few questions

Tue Mar 30 15:05:05 PST 2004

Jean-Marc Valin wrote:

>>Reverberation suppression?
>>    
>>
>
>Basically, it means that if you are in a room with lots of echo (long
>decay), I can reduce it a bit.
>
>  
>
>>I guess this would help reduce local source echoes?  I've never 
>>_noticed_ that to be a problem in my use, but I would imagine that 
>>using a notebook's built-in microphone, you'd get some echo off of the 
>>screen and stuff [also from the whole room]..
>>
>>Most of these echoes aren't so bad, but I guess they might make the 
>>encoding job harder.  I'd sure rather see the echo cancellation 
>>finished [not that I have any say on what you work on!!!].
>>    
>>
>
>Well, I'm still looking for help :)
>
>  
>
>>Here's the numbers I got doing vad on 655 seconds of audio (about half 
>>is speech, half is absolute silence [0's]).
>>P3-600: 25 seconds
>>Athlon XP 1700+ (1.45Ghz): 5 seconds
>>P4 2.8Ghz: 8.8 seconds.
>>    
>>
>
>These numbers sound like a problem I has a while ago with the decoder.
>The VAD shouldn't take much CPU so I suspect there might be floating
>point underflows in some part, slowing down the Intel CPUs a lot (for
>some reason, the AMD CPUs seem to handle underflows faster).
>  
>

Hmm, How can I find that out?  How much CPU would you expect it to take?

I've been playing with oprofile, but I don't see it getting that finely 
grained..

>>Anyway, I think I might need to find a less computationally intensive 
>>VAD solution for the conference.  VAD is currently only used when 
>>people connect via the PSTN, so they presumably have a decent SNR, and 
>>I may be able to get away with an energy envelope type of thing, 
>>without needing frequency domain analysis.  But before I go and start 
>>coding this, is there any simple optimizations that can be done to the 
>>preprocessor when it is  being used only for the VAD decision?
>>    
>>
>
>Have you tried using the (less accurate) VAD that's in the codec itself
>(SPEEX_SET_VAD)?
>  
>
I'll take a look at that.  In this case [in the conferencing 
application], I'm not actually using speex encoding [these are PSTN 
callers, I do VAD in clients when I control them], so I'd need to see if 
I could rip it out of speex to use it.

Also, I do have a couple of patches to the preprocessor to send along 
actually; basically this makes the start and continue probabilities 
parameters that can be set by callers.  We're currently using very low 
probabilities;   Much lower than your defaults, VAD_START=0.05 
VAD_CONTINUE=0.02.  We also have 20 frame (2/5 sec) "tail" that is 
outside the preprocessor, which continues treating some frames as speech 
after the detector has dropped out.

<p>Here's a patch:

<p>==================================================

<p>Diff for file preprocess.c, 1.2 -> 1.3
Index: preprocess.c
===================================================================
RCS file: /home/UniServ/dls/CVS/hms/app_conference/libspeex/preprocess.c,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -w -r1.2 -r1.3

--- preprocess.c	2003/11/07 23:40:23	1.2
+++ preprocess.c	2004/02/06 17:10:24	1.3
@@ -145,6 +145,9 @@
    st->agc_level = 8000;
    st->vad_enabled = 0;
 
+   st->speech_prob_start = SPEEX_PROB_START ;
+   st->speech_prob_continue = SPEEX_PROB_CONTINUE ;
+   
    st->frame = (float*)speex_alloc(2*N*sizeof(float));
    st->ps = (float*)speex_alloc(N*sizeof(float));
    st->gain2 = (float*)speex_alloc(N*sizeof(float));
@@ -435,12 +438,19 @@
       st->speech_prob = p0/(1e-25+p1+p0);
       /*fprintf (stderr, "%f %f %f ", tot_loudness, st->loudness2, st->speech_prob);*/
 
+	/* decide if frame is speech using speech probability settings */
+
 /*      if (st->speech_prob> .35 || (st->last_speech < 20 && st->speech_prob>.1)) */
-      if (st->speech_prob> .20 || (st->last_speech < 20 && st->speech_prob>.05))
+	if (
+		st->speech_prob > st->speech_prob_start
+		|| ( st->last_speech < 20 && st->speech_prob > st->speech_prob_continue ) 
+	)
       {
          is_speech = 1;
          st->last_speech = 0;
-      } else {
+	} 
+	else 
+	{
          st->last_speech++;
          if (st->last_speech<20)
            is_speech = 1;
@@ -985,6 +995,30 @@
    case SPEEX_PREPROCESS_GET_VAD:
       (*(int*)ptr) = st->vad_enabled;
       break;
+      
+	case SPEEX_PREPROCESS_SET_PROB_START:
+		st->speech_prob_start = (*(float*)ptr) ;
+		if ( st->speech_prob_start > 1 )
+			st->speech_prob_start = st->speech_prob_start / 100 ;
+		if ( st->speech_prob_start > 1 || st->speech_prob_start < 0 )
+			st->speech_prob_start = SPEEX_PROB_START ;
+		break ;
+	case SPEEX_PREPROCESS_GET_PROB_START:
+		(*(float*)ptr) = st->speech_prob_start ;
+		break ;
+      
+	case SPEEX_PREPROCESS_SET_PROB_CONTINUE:
+		st->speech_prob_continue = (*(float*)ptr) ;
+		if ( st->speech_prob_continue > 1 )
+			st->speech_prob_continue = st->speech_prob_continue / 100 ;
+		if ( st->speech_prob_continue > 1 || st->speech_prob_continue < 0 )
+			st->speech_prob_continue = SPEEX_PROB_CONTINUE ;
+		break ;
+		break ;
+	case SPEEX_PREPROCESS_GET_PROB_CONTINUE:
+		(*(float*)ptr) = st->speech_prob_continue ;
+		break ;
+      
    default:
       speex_warning_int("Unknown speex_preprocess_ctl request: ", request);
       return -1;

Diff for file speex_preprocess.h, 1.1 -> 1.2
Index: speex_preprocess.h
===================================================================
RCS file: /home/UniServ/dls/CVS/hms/app_conference/libspeex/speex_preprocess.h,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -w -r1.1 -r1.2
--- speex_preprocess.h	2003/11/06 21:57:59	1.1
+++ speex_preprocess.h	2004/02/06 17:10:24	1.2
@@ -49,6 +49,10 @@
    float  agc_level;
    int    vad_enabled;
 
+	// probabilities to check speech_prob against
+	float speech_prob_start ;
+	float speech_prob_continue ;
+
    float *frame;             /**< Processing frame (2*ps_size) */
    float *ps;                /**< Current power spectrum */
    float *gain2;             /**< Adjusted gains */
@@ -108,8 +112,9 @@
 
 /** Used like the ioctl function to control the preprocessor parameters */
 int speex_preprocess_ctl(SpeexPreprocessState *st, int request, void *ptr);
-
 
+#define SPEEX_PROB_START 0.35 
+#define SPEEX_PROB_CONTINUE 0.1
 
 #define SPEEX_PREPROCESS_SET_DENOISE 0
 #define SPEEX_PREPROCESS_GET_DENOISE 1
@@ -122,6 +127,12 @@
 
 #define SPEEX_PREPROCESS_SET_AGC_LEVEL 6
 #define SPEEX_PREPROCESS_GET_AGC_LEVEL 7
+
+#define SPEEX_PREPROCESS_SET_PROB_START 8
+#define SPEEX_PREPROCESS_GET_PROB_START 9
+
+#define SPEEX_PREPROCESS_SET_PROB_CONTINUE 10
+#define SPEEX_PREPROCESS_GET_PROB_CONTINUE 11
 
 #ifdef __cplusplus

==================================================
 }

<p><p><p><p><p><p>>	Jean-Marc
>
>  
>

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.