[Speex-dev] Re: speex echo cancellation limitations

Tue May 2 11:28:10 PDT 2006

Thanks for your prompt and helpful reply.

In FLOAT_DIVU() it hangs at the following:
   while (a.m >= b.m)
   {
      e++;
      a.m >>= 1;
   }
for the case where a and b are both zero (yes, division by zero).
This happens from mdf.c:
leak_estimate = FLOAT_EXTRACT16(FLOAT_SHL(FLOAT_DIVU(st->Pey, st->Pyy),14));
where st->Pey and st->Pyy are both zero, which happens for the following input 
data to testecho program:
-- first arg is file containing sine wave of magnitude +/- 32767
-- 2nd arg is file containing all zeroes

The division by zero appears to be caused by the calculation:
See = inner_prod(st->e+st->frame_size, st->e+st->frame_size, st->frame_size)
which returns negative due to overflow occuring in mdf.c:inner_prod() :
      part = MAC16_16(part,*x++,*y++);
      part = MAC16_16(part,*x++,*y++);
      part = MAC16_16(part,*x++,*y++);
      part = MAC16_16(part,*x++,*y++);
     sum = ADD32(sum,SHR32(part,6));
This overflow can be avoided by rewriting this as:
      part = part + ((*x++ * *y++)>>1);
      part = part + ((*x++ * *y++)>>1);
      part = part + ((*x++ * *y++)>>1);
      part = part + ((*x++ * *y++)>>1);
      sum += part>>5;
I am assuming that the value 0x8000 is avoided in the input.

Even with this fix there is definitely some bad stuff going on; the output 
data is corrupted looking.
I put assertions into FLOAT_MUL32U(), FLOAT_DIV32_FLOAT() and FLOAT_DIV32() to 
assert that the "a" arguments were non-negative. Using some real life data, i 
found that i had to shift the real life data right by two (i shifted both 
inputs by same amount) to avoid asserting; shifting by just one almost worked 
but failed for some case (i don't remember what that case was... not the one 
i mentioned above).
[Alternately i could just clip the input to some threshold].
To be on the safe side with the above functions, i removed the asserts and 
modified the tests for == 0 to <= 0 ...

Clearly, some sort of automated testing scheme could help here...

-Ted

On Monday 01 May 2006 21:33, Jean-Marc Valin wrote:
> > I am writing to gain a better understanding of the limitations of speex
> > echo cancellation, esp. with respect to the fixed point implementation.
> > If these limitations have been documented elsewhere already, please let
> > me know!
>
> Nothing officially documented, sorry.
>
> > I observe experimentally that when one or both of the echo or ref data
> > for speex_echo_cancel() have values outside of the range +/- 2^13 (and
> > especially +/- 2^14) that overflows occur leading to the obvious symptom
> > of infinite looping and probably less obvious results.
>
> Where does the infinite loop happen?
>
> > What was your intended domain for the input data?
>
> I was targeting about that type of range (a few samples outside of
> +-2^14), so I'd be interested in samples that cause incorrect behaviour.
> Because of the limited resolution (16 bits for most of the things) I've
> chosen, I sort of accepted I could not *guarantee* there wouldn't be
> overflows, but I thought they wouldn't happen in practice. This is why
> I'm interested in any help fixing that (especially if you can check
> where the overflows happen. In any case, an implementation on a "real
> DSP" (i.e. not ARM) should saturate the additions.
>
> > I observe experimentally that under some pathological conditions of
> > double talk where the echo and ref have the same frequency, the output is
> > attenuated to near zero.
>
> Well, if the ref and echo are perfectly correlated for a sufficient
> amount of time, it's simply not possible to distinguish one from the
> other, so this behaviour would be expected. But do you really have an
> example of that happening in the real world?
>
> > What assurance is there that in real life telephony we will not run into
> > such problems?
>
> No guarantee, but I'm targeting normal use of telephony. If I missed
> anything, let me know.
>
> > Non-power of two frame sizes and tail lengths seem to work just fine. 
> > What should i know about acceptable or optimal frame sizes and tail
> > lengths?
>
> I recommend using frame sizes of about 5-20 ms (samples depend on
> sampling rate) and tail lengths of 100-200 ms for acoustic echo. Of
> course, line echo would require less than that, but I've focused mainly
> on acoustic echo, which is a harder problem (but line echo should work
> as well).
>
> > I'm looking for ways to further reduce the cycle count besides enabling
> > fixed point and possibly providing some inline assembler for a fixed*.h
> > file (my target processor is mips4k series... no floating point).
>
> I think most of the gain would come from using an FFT optimised for your
> CPU. I've made it (relatively) easy to swap FFTs through my fftwrap.c
> wrapper.
>
> > I observe experimentally that the computation time on a per sample basis
> > is not heavily dependent on the tail length and is almost independent of
> > the frame size except for smaller frame sizes... i think... does this
> > seem correct?
>
> The complexity has many different components that depend on frame size,
> tail size and constant terms.
>
> > Is there a reasonable way to e.g. perform certain calculations only every
> > other frame or something of that manner?
>
> You could maybe do that for a few things, but the speed improvement
> probably would be worth the performance degradation. There are a few
> tricks you could use. For example, two "#if 1" that could be replaced by
> "#if 0" and reduce the complexity at the price of a bit of noise when
> adaptation is fast. I guess you could also do without the
> re-normalization of the weights and save a bit there as well.
>
> 	Jean-Marc