[Speex-dev] Major internal changes, TI DSP build change
Jim Crichton
jim.crichton at comcast.net
Sat Apr 22 19:50:49 PDT 2006
Jean-Marc,
>> >I fixed it in svn. Could you check that?
>>
>> Now all platforms match again. Note that the measured SNR for this test
>> sample is lower than with the broken code (10.87 vs 11.10), but of course
>> this is no way to judge the real quality.
>
> SNR, especially on a single sample, can be very misleading. Yet, could
> you just check that the DSP results match what you get on a PC?
I do not have a build environment for a PC. I have been using the 6-second
test file male.wav from the Speex site for my simulations, if someone else
wants to run the audio through the encoder and decoder at 8kbps, complexity
1. I might be able to get a coworker to do this, but not any time soon.
>> >Does the C55 have a 32x16 multiplier or do you mean it handles my
>> >emulation of it well?
>>
>> I has two ALUs with 17x17 bit MACs, and it has an instruction that does
>> this:
>> ACy = M40(rnd((ACx >> #16) + (uns(Xmem) * uns(Ymem))))
>>
>> I never quite understood this, so I went of and looked at the manuals.
>> It
>> can multiply the low half in one cycle, then shift and add it to the high
>> half in a second cycle. And, in a type loop the parallel ALUs would
>> allow
>> one 32x16 multiply per cycle.
>
> Just one thing I'd like to understand. Did you do some tricks and/or
> assembly to implement the MULT16_32_Q* routines with these instructions
> or does the compiler figure them out by itself?
No, I have done no assembly work on any of these DSPs. It has been a few
years since I did assembly work on any DSP, and it does not look like I will
need to for my applications. I just found the above instruction in the
instruction set reference manual, and it seems perfect for 16x32 multiplies.
When I look at the assembler output for filter.c, I do not see this
instruction used, probably because there is always some shift in the result
(like MULT_16_32_Q15, which takes 6 instructions to implement: two
multiplies, two adds, a shift, and a store). So, never mind.
>> The C54x cannot do this, and uses library calls for 32x16 multiplies.
>
> Why is that? By default all the 32x16 multiplies are computed using only
> 16x16 multiplies (see fixed_generic.h).
Once again, I spoke to soon. I saw the library calls when I first tested
the C54x last year, but I do not see them now. I am using a later version
of the TI compiler, and there could be some different compile options.
>> The
>> changes that you have made since 1.1.8 are most dramatic for the 54x,
>> which
>> dropped from 184 (unusable in real time, the fastest parts are 160 MHz)
>> to
>> 79 MIPs. The C55x dropped from 41.5 to 29.4 MIPs (mixed 16/32 bit
>> capability), and the C6x dropped slightly from 36 to 34.5 MIPs (32bit
>> machine).
>
> Glad it makes such a difference. I'm just surprised that the C6x
> complexity is that high.
There was a post from Jerry Trantow on 4-Feb that he had cut the C6x MIPs
about in half with some assembly optimization (do you know if he planned to
submit these?). Because this is a very parallel machine, it is not an
assembly language for the faint of heart.
- Jim
More information about the Speex-dev
mailing list