[Speex-dev] Major internal changes, TI DSP build change
Jim Crichton
jim.crichton at comcast.net
Sat Apr 22 06:34:23 PDT 2006
Jean Marc,
>> The C5x and C6x output diverges in build 10143, which has log message
>> "lpc
>> floor converted to fixed-point." Also, the measured SNR changed from
>> 11.05
>> in builds 9854-10141 to 9.22 and 9.24 in 10143.
>
>Actually, build 10143 introduced another bug, that was the reason for
>the 1.1.11.1 release.
>
>> There is just four lines in modes.c which declare the constant, and one
>> line
>> changed in nb_celp.c and sb_celp.c which use the constant. Looking at
>> nb_mode, QCONT16(.0002,15) evaluates to 0x3FF9 on the C55 and 0x4006 on
>> the
>> C6x. When I patch the value 0x4006 into the C55 build, the output
>> matches
>> the C6x. The problem is that 2^15 evaluates to -32768 on the C55 and
>> 32768
>> on the C6x.
>
>Right on!
>
>> Applying our friend EXTEND32 causes the constant to evaluate correctly.
>> In
>> fixed_generic.h,
>> #define QCONST16(x,bits)
>> ((spx_word16_t)((x)*((EXTEND32(1))<<(bits))+((EXTEND32(1))<<((bits)-1))))
>
>Actually, this is a case for a simple cast to (spx_word32_t) because
>QCONST can be used in a static initialization and EXTEND32 *can* be
>defined as a function (e.g. for fixed-point debug).
>
>> Later I will check if this change makes these two builds match in the
>> latest
> SVN code.
>
>I fixed it in svn. Could you check that?
Now all platforms match again. Note that the measured SNR for this test
sample is lower than with the broken code (10.87 vs 11.10), but of course
this is no way to judge the real quality.
>> The MIPs are not a problem for me, and the C55 does very well on 32x16
>> multiplies, so I have not played with PRECISION16 since last year.
>
>Does the C55 have a 32x16 multiplier or do you mean it handles my
>emulation of it well?
I has two ALUs with 17x17 bit MACs, and it has an instruction that does
this:
ACy = M40(rnd((ACx >> #16) + (uns(Xmem) * uns(Ymem))))
I never quite understood this, so I went of and looked at the manuals. It
can multiply the low half in one cycle, then shift and add it to the high
half in a second cycle. And, in a type loop the parallel ALUs would allow
one 32x16 multiply per cycle.
The C54x cannot do this, and uses library calls for 32x16 multiplies. The
changes that you have made since 1.1.8 are most dramatic for the 54x, which
dropped from 184 (unusable in real time, the fastest parts are 160 MHz) to
79 MIPs. The C55x dropped from 41.5 to 29.4 MIPs (mixed 16/32 bit
capability), and the C6x dropped slightly from 36 to 34.5 MIPs (32bit
machine).
- Jim
More information about the Speex-dev
mailing list