[Speex-dev] Echo Canceller Memory Usage, Frame Size

Thu May 11 08:02:26 PDT 2006

Note to everyone here, I'll be traveling for the next two weeks, so I'll
be a bit less responsive.

> The overall allocated memory usage of the echo canceler is, in bytes:
> 4*frame_size *(27 + 5*ceiling(filter_length/frame_size)) + C
> 
> Where C = 420 on a TI C55 DSP (16 bit machine), and C = 760 on a TI C64 DSP 
> (32 bit machine).
> 
> Where the tail length is an integer multiple of the frame size, this reduces 
> to:
> 108*frame_size + 20*filter_length.

Hmm, hadn't realized the coef in front of the frame size was that big. I
guess some of that would probably be reduced a bit. In general, there's
a lot of things that can be adapted. For example, it would be easy to
reduce the 20*filter_length to 12*filter_length at the cost of slightly
more noise during the adaptation phase. 8*filter_length would be
possible at the cost of less steady-state cancellation and/or shorter
tail lengths. I'll have to look for the 108*frame_size term, but I
suspect there could be a bit of waste there.

> So the memory usage is a much stronger function of the frame length than the 
> tail length, but both factors are pretty large.  

Well, considering that filter_length is usually 5-20 times larger than
frame_size, then both terms are usually in the same order of magnitude.

> I recall seeing in an 
> earlier thread that performance is degraded when the filter length is too 
> long because of added noise (but I cannot find that thread at the moment). 
> Of course, if the filter length is too short, then it will not be able to 
> cancel all of the echo.  

The problem with long tail lengths is not only the noise, but the fact
that the adaptation time is more or less proportional to the tail
length.

> For the frame length, however, the tradeoffs are 
> not so obvious.  You said the following last week in the thread "Re: speex 
> echo cancellation limitations":
> 
> "I recommend using frame sizes of about 5-20 ms (samples depend on
> sampling rate) and tail lengths of 100-200 ms for acoustic echo. Of
> course, line echo would require less than that, but I've focused mainly
> on acoustic echo, which is a harder problem (but line echo should work
> as well)."

I've mainly observed that 5-20 ms seems to work good and I've designed
the fixed-point according to that. In general, the longer the frame
size, the less CPU it takes (for equal tail size). Otherwise, there's a
tradeoff. Large frame sizes provide better decorrelation of the data
(good), but the adaptation is done less often (bad). That's about all I
can say here. You just have to test and see what works best. In many
cases, however, the application dictates the frame size. 

> Could you elaborate a bit on the effect of changing the frame size, other 
> than memory usage?  In the small test case I have been using, there is a 
> 20ms delay between the speaker and microphone signals. 

You should probably delay the speaker signal so you don't waste
CPU/memory/efficiency because of that delay.

>  Test-echo.c has 
> defaults frame_len=128, filter_len=1024.  I ran this case, and also 
> frame_len=80, filter_len=320 (10 and 40ms at 8000 Hz).  The second case 
> attenuated the echo better, probably because the first filter length is much 
> longer than the echo path delay.  

Exactly.

> How low a frame length would you recommend 
> for 8000Hz sample rate?

5-20 ms, as noted above.

	Jean-Marc