[Speex-dev] Speex echo canceller on TI C55 DSP

Tue May 9 04:23:36 PDT 2006

>Just tried your files and I'm not running into any infinite loops and
>the cancellation works fine. Unless the C6x has the same problem, I
>suspect a 16-bit problem. I'll check and see if I find something. About
>the r=0 problem, I can't find where it ends up in a denominator, so I
>suspect is not (directly) the problem.

I built and ran the same test on the TI C64 simulator, and the echo was 
canceled nicely (about 10:1 reduction in the peak amplitude during the 
second of two brief speech bursts).  So, my problem must again be related to 
the 16-bit processing on the C5X DSPs.

Also, the line where it is hanging is:
         st->power_1[i] = 
FLOAT_SHL(FLOAT_DIV32_FLOAT(MULT16_32_Q15(M_1,r),FLOAT_MUL32U(e,st->power[i]+10)),WEIGHT_SHIFT+16);

and it is e that is in the denominator, not r (sorry for the confusion).  I 
can now run the simulations side-by-side and look for differences.

- Jim

Le lundi 08 mai 2006 à 20:05 -0400, Jim Crichton a écrit :
> > I've just been made aware of these problems (look for the thread "speex
> > echo cancellation limitations"). It's on my short-term TODO list.
>
> I saw the other thread, my problems happened in different (but similar)
> routines.
>
> >> If fftwrap.c, I ifdefed out the spx_fft_float and spx_ifft_float
> >> routines,
> >> because there were not used and required smallft.c (which is not so 
> >> small
> >> at
> >> all) to be added to the build.
> >
> > Right, need to cleanup that part...
> >
> >> With these changes, the link was successful, using testecho.c with some
> >> modifications for the C55 environment.  The code and data memory
> >> requirements were a lot more than I had hoped (>20kbytes of dynamic 
> >> data
> >> memory for block size=128, tail length = 1024), and I will probably not
> >> be
> >> able to fit it in the production build without some trimming.
> >
> > Yes, there may be a bit of memory reduction possible here. Of course,
> > decreasing the tail length is also a rather easy way.
> >
> >> When I run the build, it goes into an infinite loop in FLOAT_DIV32 
> >> (mdf.c
> >> line 660), which occurs because adapt_rate is < 0, which happens when
> >> FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}.  The rounding is 
> >> causing
> >> the result to go negative.  I worked around this by changing
> >
> > I think that was mentioned in the previous thread...
> >
> >>       return (a.m+(1<<(-a.e-1)))>>-a.e;
> >> to
> >>       return (((spx_uint16_t) a.m)+(1<<(-a.e-1)))>>-a.e;
> >
> > Is that sufficient to remove all the overflows at this place?
>
> The rounding takes the value to exactly 0x8000, and it is followed by a
> right shift, so you just need to avoid the sign extension.
>
> >> I have not had time to trace this, but it looks like a similar problem,
> >> where the result of MULT16_32_Q15(M_1,r) is negative, and
> >> FLOAT_DIV32_FLOAT
> >> bombs.  Maybe the best thing to do next is to instrument the routines 
> >> in
> >> pseudofloat.h which have loops, but I will not get to that for a day or
> >> two.
> >
> > Yeah, r is never supposed to be negative and the float routines assume
> > that.
>
> No, it was a divide by zero, as explained in my second post.  I will try a
> build on the C6x DSP to see if this is a 16 vs. 32-bit problem.  I sent 
> the
> test files off-list.
>
> >> 1.  speex_echo_state_init takes about 20M instructions, which is a 
> >> little
> >> frightening,
> >
> > That's the fft initialization that calls a lot of float cos() functions.
> > If you have a fixed version of cos() you can use it there, otherwise a
> > fixed table (for a certain size) would work.
> >
> >> and the calls to speex_echo_cancel take about 630K instructions
> >> for 128 samples.  Given your recent experience with the fixed point
> >> canceller, does this sound rational?  The MIPs for the canceller are
> >> similar
> >> to the MIPs for the encoder running 8kbps, complexity 1.
> >
> > The order of magnitude seems right. It may be possible to reduce that a
> > bit, though. If you have an optimized FFT, you could replace kiss_fft
> > with it and get a big improvement right there.
>
> Yeah, but then I have to try to actually understand the algorithm.  I am 
> not
> sure that those brain cells are still alive.
>
> >> 2.  The testecho example uses a frame length and tail size that are
> >> powers
> >> of two (128, 1024).  Are there any implications to using sizes which 
> >> are
> >> not
> >> powers of two?  It would be most convenient to use the encoder frame 
> >> size
> >> (160), and some multiple of that for the tail size.  How does the frame
> >> size
> >> affect performance (I understand that the tail length determines what
> >> echo
> >> signals are cancelable)?
> >
> > Non powers of two will be a bit slower because of the FFT, but that's
> > all. I made sure the echo canceller works with 160, precisely because
> > it's the frame size used by Speex. Note that I don't recommend using
> > frames more than 20 ms long (at any sampling rate).
> >
> >> 3.  Do you have any suggestions for code/data memory reduction for the
> >> canceller, other than to make the tail length no longer than necessary
> >> (this
> >> is a line echo canceller for a local phone, so I should be able to keep
> >> it
> >> to 40ms).  I was surprised by the size of the FFT code, but I guess 
> >> that
> >> it
> >> is doing much more than the radix2 version in the TI library.
> >
> > The FFT code has more than just the radix two, so you can save there. It
> > wasn't meant to be an optimized FFT, so if TI supplies you with one,
> > it's probably a good idea to use it (that's what fft_wrap is for). Also,
> > given that the memory use is almost directly proportional to the tail
> > length, reducing that one to 40 ms will make a huge difference.
>
> Thanks for the advice.
>
> - Jim
>
>
>