[Speex-dev] Major internal changes, TI DSP build change

Sat Apr 22 06:34:23 PDT 2006

Jean Marc,

>> The C5x and C6x output diverges in build 10143, which has log message 
>> "lpc
>> floor converted to fixed-point."  Also, the measured SNR changed from 
>> 11.05
>> in builds 9854-10141 to 9.22 and 9.24 in 10143.
>
>Actually, build 10143 introduced another bug, that was the reason for
>the 1.1.11.1 release.
>
>> There is just four lines in modes.c which declare the constant, and one 
>> line
>> changed in nb_celp.c and sb_celp.c which use the constant.  Looking at
>> nb_mode, QCONT16(.0002,15) evaluates to 0x3FF9 on the C55 and 0x4006 on 
>> the
>> C6x.  When I patch the value 0x4006 into the C55 build, the output 
>> matches
>> the C6x.  The problem is that 2^15 evaluates to -32768 on the C55 and 
>> 32768
>> on the C6x.
>
>Right on!
>
>> Applying our friend EXTEND32 causes the constant to evaluate correctly. 
>> In
>> fixed_generic.h,
>> #define QCONST16(x,bits)
>> ((spx_word16_t)((x)*((EXTEND32(1))<<(bits))+((EXTEND32(1))<<((bits)-1))))
>
>Actually, this is a case for a simple cast to (spx_word32_t) because
>QCONST can be used in a static initialization and EXTEND32 *can* be
>defined as a function (e.g. for fixed-point debug).
>
>> Later I will check if this change makes these two builds match in the 
>> latest
> SVN code.
>
>I fixed it in svn. Could you check that?

Now all platforms match again.  Note that the measured SNR for this test 
sample is lower than with the broken code (10.87 vs 11.10), but of course 
this is no way to judge the real quality.

>> The MIPs are not a problem for me, and the C55 does very well on 32x16
>> multiplies, so I have not played with PRECISION16 since last year.
>
>Does the C55 have a 32x16 multiplier or do you mean it handles my
>emulation of it well?

I has two ALUs with 17x17 bit MACs, and it has an instruction that does 
this:
ACy = M40(rnd((ACx >> #16) + (uns(Xmem) * uns(Ymem))))

I never quite understood this, so I went of and looked at the manuals.  It 
can multiply the low half in one cycle, then shift and add it to the high 
half in a second cycle.  And, in a type loop the parallel ALUs would allow 
one 32x16 multiply per cycle.

The C54x cannot do this, and uses library calls for 32x16 multiplies.  The 
changes that you have made since 1.1.8 are most dramatic for the 54x, which 
dropped from 184 (unusable in real time, the fastest parts are 160 MHz) to 
79 MIPs.  The C55x dropped from 41.5 to 29.4 MIPs (mixed 16/32 bit 
capability), and the C6x dropped slightly from 36 to 34.5 MIPs (32bit 
machine).

- Jim