[Speex-dev] Speex echo canceller on TI C55 DSP
Jim Crichton
jim.crichton at comcast.net
Mon May 8 17:05:43 PDT 2006
> I've just been made aware of these problems (look for the thread "speex
> echo cancellation limitations"). It's on my short-term TODO list.
I saw the other thread, my problems happened in different (but similar)
routines.
>> If fftwrap.c, I ifdefed out the spx_fft_float and spx_ifft_float
>> routines,
>> because there were not used and required smallft.c (which is not so small
>> at
>> all) to be added to the build.
>
> Right, need to cleanup that part...
>
>> With these changes, the link was successful, using testecho.c with some
>> modifications for the C55 environment. The code and data memory
>> requirements were a lot more than I had hoped (>20kbytes of dynamic data
>> memory for block size=128, tail length = 1024), and I will probably not
>> be
>> able to fit it in the production build without some trimming.
>
> Yes, there may be a bit of memory reduction possible here. Of course,
> decreasing the tail length is also a rather easy way.
>
>> When I run the build, it goes into an infinite loop in FLOAT_DIV32 (mdf.c
>> line 660), which occurs because adapt_rate is < 0, which happens when
>> FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}. The rounding is causing
>> the result to go negative. I worked around this by changing
>
> I think that was mentioned in the previous thread...
>
>> return (a.m+(1<<(-a.e-1)))>>-a.e;
>> to
>> return (((spx_uint16_t) a.m)+(1<<(-a.e-1)))>>-a.e;
>
> Is that sufficient to remove all the overflows at this place?
The rounding takes the value to exactly 0x8000, and it is followed by a
right shift, so you just need to avoid the sign extension.
>> I have not had time to trace this, but it looks like a similar problem,
>> where the result of MULT16_32_Q15(M_1,r) is negative, and
>> FLOAT_DIV32_FLOAT
>> bombs. Maybe the best thing to do next is to instrument the routines in
>> pseudofloat.h which have loops, but I will not get to that for a day or
>> two.
>
> Yeah, r is never supposed to be negative and the float routines assume
> that.
No, it was a divide by zero, as explained in my second post. I will try a
build on the C6x DSP to see if this is a 16 vs. 32-bit problem. I sent the
test files off-list.
>> 1. speex_echo_state_init takes about 20M instructions, which is a little
>> frightening,
>
> That's the fft initialization that calls a lot of float cos() functions.
> If you have a fixed version of cos() you can use it there, otherwise a
> fixed table (for a certain size) would work.
>
>> and the calls to speex_echo_cancel take about 630K instructions
>> for 128 samples. Given your recent experience with the fixed point
>> canceller, does this sound rational? The MIPs for the canceller are
>> similar
>> to the MIPs for the encoder running 8kbps, complexity 1.
>
> The order of magnitude seems right. It may be possible to reduce that a
> bit, though. If you have an optimized FFT, you could replace kiss_fft
> with it and get a big improvement right there.
Yeah, but then I have to try to actually understand the algorithm. I am not
sure that those brain cells are still alive.
>> 2. The testecho example uses a frame length and tail size that are
>> powers
>> of two (128, 1024). Are there any implications to using sizes which are
>> not
>> powers of two? It would be most convenient to use the encoder frame size
>> (160), and some multiple of that for the tail size. How does the frame
>> size
>> affect performance (I understand that the tail length determines what
>> echo
>> signals are cancelable)?
>
> Non powers of two will be a bit slower because of the FFT, but that's
> all. I made sure the echo canceller works with 160, precisely because
> it's the frame size used by Speex. Note that I don't recommend using
> frames more than 20 ms long (at any sampling rate).
>
>> 3. Do you have any suggestions for code/data memory reduction for the
>> canceller, other than to make the tail length no longer than necessary
>> (this
>> is a line echo canceller for a local phone, so I should be able to keep
>> it
>> to 40ms). I was surprised by the size of the FFT code, but I guess that
>> it
>> is doing much more than the radix2 version in the TI library.
>
> The FFT code has more than just the radix two, so you can save there. It
> wasn't meant to be an optimized FFT, so if TI supplies you with one,
> it's probably a good idea to use it (that's what fft_wrap is for). Also,
> given that the memory use is almost directly proportional to the tail
> length, reducing that one to 40 ms will make a huge difference.
Thanks for the advice.
- Jim
More information about the Speex-dev
mailing list