[Speex-dev] Speex echo canceller on TI C55 DSP

Mon May 8 15:48:53 PDT 2006

Hi Jim,

I've just been made aware of these problems (look for the thread "speex
echo cancellation limitations"). It's on my short-term TODO list.

> If fftwrap.c, I ifdefed out the spx_fft_float and spx_ifft_float routines, 
> because there were not used and required smallft.c (which is not so small at 
> all) to be added to the build.

Right, need to cleanup that part...

> With these changes, the link was successful, using testecho.c with some 
> modifications for the C55 environment.  The code and data memory 
> requirements were a lot more than I had hoped (>20kbytes of dynamic data 
> memory for block size=128, tail length = 1024), and I will probably not be 
> able to fit it in the production build without some trimming.

Yes, there may be a bit of memory reduction possible here. Of course,
decreasing the tail length is also a rather easy way.

> When I run the build, it goes into an infinite loop in FLOAT_DIV32 (mdf.c 
> line 660), which occurs because adapt_rate is < 0, which happens when 
> FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}.  The rounding is causing 
> the result to go negative.  I worked around this by changing

I think that was mentioned in the previous thread...

>       return (a.m+(1<<(-a.e-1)))>>-a.e;
> to
>       return (((spx_uint16_t) a.m)+(1<<(-a.e-1)))>>-a.e;

Is that sufficient to remove all the overflows at this place?

> I have not had time to trace this, but it looks like a similar problem, 
> where the result of MULT16_32_Q15(M_1,r) is negative, and FLOAT_DIV32_FLOAT 
> bombs.  Maybe the best thing to do next is to instrument the routines in 
> pseudofloat.h which have loops, but I will not get to that for a day or two.

Yeah, r is never supposed to be negative and the float routines assume
that.

> 1.  speex_echo_state_init takes about 20M instructions, which is a little 
> frightening, 

That's the fft initialization that calls a lot of float cos() functions.
If you have a fixed version of cos() you can use it there, otherwise a
fixed table (for a certain size) would work.

> and the calls to speex_echo_cancel take about 630K instructions 
> for 128 samples.  Given your recent experience with the fixed point 
> canceller, does this sound rational?  The MIPs for the canceller are similar 
> to the MIPs for the encoder running 8kbps, complexity 1.

The order of magnitude seems right. It may be possible to reduce that a
bit, though. If you have an optimized FFT, you could replace kiss_fft
with it and get a big improvement right there.

> 2.  The testecho example uses a frame length and tail size that are powers 
> of two (128, 1024).  Are there any implications to using sizes which are not 
> powers of two?  It would be most convenient to use the encoder frame size 
> (160), and some multiple of that for the tail size.  How does the frame size 
> affect performance (I understand that the tail length determines what echo 
> signals are cancelable)?

Non powers of two will be a bit slower because of the FFT, but that's
all. I made sure the echo canceller works with 160, precisely because
it's the frame size used by Speex. Note that I don't recommend using
frames more than 20 ms long (at any sampling rate).

> 3.  Do you have any suggestions for code/data memory reduction for the 
> canceller, other than to make the tail length no longer than necessary (this 
> is a line echo canceller for a local phone, so I should be able to keep it 
> to 40ms).  I was surprised by the size of the FFT code, but I guess that it 
> is doing much more than the radix2 version in the TI library.

The FFT code has more than just the radix two, so you can save there. It
wasn't meant to be an optimized FFT, so if TI supplies you with one,
it's probably a good idea to use it (that's what fft_wrap is for). Also,
given that the memory use is almost directly proportional to the tail
length, reducing that one to 40 ms will make a huge difference.

	Jean-Marc