[Speex-dev] Speex echo canceller on TI C55 DSP

Mon May 8 09:11:59 PDT 2006

Jean-Marc,

I recently started looking at running the echo canceller on a TI C55 DSP 
along with the 8kbps narrowband Speex encoder/decoder.  This is one of those 
"braindead compilers" that you refer to from time to time, and cannot handle 
the float struct assignments in the return statements in pseudofloat.h.

Most of these were eliminated in build 11311 (patch by Brian Retford), but 
there were four left that I had to break apart.  I started with build 11343.

I got several compiler warnings for "shift out of range" in mdf.c, which I 
fixed by adding EXTEND32 to all of the SHL32s with 16 bit operands 
(st->frame_size in 6 places, st->wtmp2 in 1 place).  I have not sent patches 
for these two changes, because I still have other problems.

If fftwrap.c, I ifdefed out the spx_fft_float and spx_ifft_float routines, 
because there were not used and required smallft.c (which is not so small at 
all) to be added to the build.

With these changes, the link was successful, using testecho.c with some 
modifications for the C55 environment.  The code and data memory 
requirements were a lot more than I had hoped (>20kbytes of dynamic data 
memory for block size=128, tail length = 1024), and I will probably not be 
able to fit it in the production build without some trimming.

When I run the build, it goes into an infinite loop in FLOAT_DIV32 (mdf.c 
line 660), which occurs because adapt_rate is < 0, which happens when 
FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}.  The rounding is causing 
the result to go negative.  I worked around this by changing

      return (a.m+(1<<(-a.e-1)))>>-a.e;
to
      return (((spx_uint16_t) a.m)+(1<<(-a.e-1)))>>-a.e;

in FLOAT_EXTRACT16.  This changes the returned value from 0xfc00 to 0x400. 
Now it runs on for a while, then hits another infinite loop at mdf.c line 
641:

         st->power_1[i] = 
FLOAT_SHL(FLOAT_DIV32_FLOAT(MULT16_32_Q15(M_1,r),FLOAT_MUL32U(e,st->power[i]+10)),WEIGHT_SHIFT+16);

I have not had time to trace this, but it looks like a similar problem, 
where the result of MULT16_32_Q15(M_1,r) is negative, and FLOAT_DIV32_FLOAT 
bombs.  Maybe the best thing to do next is to instrument the routines in 
pseudofloat.h which have loops, but I will not get to that for a day or two.

I need to make a decision soon whether to seriously pursue making this work. 
With that in mind, here are some questions:

1.  speex_echo_state_init takes about 20M instructions, which is a little 
frightening, and the calls to speex_echo_cancel take about 630K instructions 
for 128 samples.  Given your recent experience with the fixed point 
canceller, does this sound rational?  The MIPs for the canceller are similar 
to the MIPs for the encoder running 8kbps, complexity 1.

2.  The testecho example uses a frame length and tail size that are powers 
of two (128, 1024).  Are there any implications to using sizes which are not 
powers of two?  It would be most convenient to use the encoder frame size 
(160), and some multiple of that for the tail size.  How does the frame size 
affect performance (I understand that the tail length determines what echo 
signals are cancelable)?

3.  Do you have any suggestions for code/data memory reduction for the 
canceller, other than to make the tail length no longer than necessary (this 
is a line echo canceller for a local phone, so I should be able to keep it 
to 40ms).  I was surprised by the size of the FFT code, but I guess that it 
is doing much more than the radix2 version in the TI library.

Regards,

Jim Crichton