[Speex-dev] Re: speex echo cancellation limitations

Mon May 1 21:33:59 PDT 2006

> I am writing to gain a better understanding of the limitations of speex echo 
> cancellation, esp. with respect to the fixed point implementation.
> If these limitations have been documented elsewhere already, please let me 
> know!

Nothing officially documented, sorry.

> I observe experimentally that when one or both of the echo or ref data for 
> speex_echo_cancel() have values outside of the range +/- 2^13 (and especially 
> +/- 2^14) that overflows occur leading to the obvious symptom of infinite 
> looping and probably less obvious results.

Where does the infinite loop happen?

> What was your intended domain for the input data?

I was targeting about that type of range (a few samples outside of
+-2^14), so I'd be interested in samples that cause incorrect behaviour.
Because of the limited resolution (16 bits for most of the things) I've
chosen, I sort of accepted I could not *guarantee* there wouldn't be
overflows, but I thought they wouldn't happen in practice. This is why
I'm interested in any help fixing that (especially if you can check
where the overflows happen. In any case, an implementation on a "real
DSP" (i.e. not ARM) should saturate the additions.

> I observe experimentally that under some pathological conditions of double 
> talk where the echo and ref have the same frequency, the output is attenuated 
> to near zero.

Well, if the ref and echo are perfectly correlated for a sufficient
amount of time, it's simply not possible to distinguish one from the
other, so this behaviour would be expected. But do you really have an
example of that happening in the real world?

> What assurance is there that in real life telephony we will not run into such 
> problems?

No guarantee, but I'm targeting normal use of telephony. If I missed
anything, let me know.

> Non-power of two frame sizes and tail lengths seem to work just fine.  What 
> should i know about acceptable or optimal frame sizes and tail lengths?

I recommend using frame sizes of about 5-20 ms (samples depend on
sampling rate) and tail lengths of 100-200 ms for acoustic echo. Of
course, line echo would require less than that, but I've focused mainly
on acoustic echo, which is a harder problem (but line echo should work
as well).

> I'm looking for ways to further reduce the cycle count besides enabling fixed 
> point and possibly providing some inline assembler for a fixed*.h file (my 
> target processor is mips4k series... no floating point).

I think most of the gain would come from using an FFT optimised for your
CPU. I've made it (relatively) easy to swap FFTs through my fftwrap.c
wrapper.

> I observe experimentally that the computation time on a per sample basis is 
> not heavily dependent on the tail length and is almost independent of the 
> frame size except for smaller frame sizes... i think... does this seem 
> correct?

The complexity has many different components that depend on frame size,
tail size and constant terms. 

> Is there a reasonable way to e.g. perform certain calculations only every 
> other frame or something of that manner?

You could maybe do that for a few things, but the speed improvement
probably would be worth the performance degradation. There are a few
tricks you could use. For example, two "#if 1" that could be replaced by
"#if 0" and reduce the complexity at the price of a bit of noise when
adaptation is fast. I guess you could also do without the
re-normalization of the weights and save a bit there as well.

	Jean-Marc