[Speex-dev] Major internal changes, TI DSP build change
Jean-Marc Valin
Jean-Marc.Valin at USherbrooke.ca
Sat Apr 22 06:55:40 PDT 2006
> >I fixed it in svn. Could you check that?
>
> Now all platforms match again. Note that the measured SNR for this test
> sample is lower than with the broken code (10.87 vs 11.10), but of course
> this is no way to judge the real quality.
SNR, especially on a single sample, can be very misleading. Yet, could
you just check that the DSP results match what you get on a PC?
> >Does the C55 have a 32x16 multiplier or do you mean it handles my
> >emulation of it well?
>
> I has two ALUs with 17x17 bit MACs, and it has an instruction that does
> this:
> ACy = M40(rnd((ACx >> #16) + (uns(Xmem) * uns(Ymem))))
>
> I never quite understood this, so I went of and looked at the manuals. It
> can multiply the low half in one cycle, then shift and add it to the high
> half in a second cycle. And, in a type loop the parallel ALUs would allow
> one 32x16 multiply per cycle.
Just one thing I'd like to understand. Did you do some tricks and/or
assembly to implement the MULT16_32_Q* routines with these instructions
or does the compiler figure them out by itself?
> The C54x cannot do this, and uses library calls for 32x16 multiplies.
Why is that? By default all the 32x16 multiplies are computed using only
16x16 multiplies (see fixed_generic.h).
> The
> changes that you have made since 1.1.8 are most dramatic for the 54x, which
> dropped from 184 (unusable in real time, the fastest parts are 160 MHz) to
> 79 MIPs. The C55x dropped from 41.5 to 29.4 MIPs (mixed 16/32 bit
> capability), and the C6x dropped slightly from 36 to 34.5 MIPs (32bit
> machine).
Glad it makes such a difference. I'm just surprised that the C6x
complexity is that high.
Jean-Marc
More information about the Speex-dev
mailing list