AW: [Tremor] TI55xx implementation: stuck

Roland Wintersteller rwinters at
Sat Oct 23 11:59:01 PDT 2004

Let me briefly sum up (and give some additional) facts of this

- You have had problems on porting Tremor to C5x DSP, but now it works.

- You were the first guy who achieved to make Ogg Vorbis run on the C5x
DSP, but in the meanwhile several others (including Johannes Sandvall
and me) got Ogg Vorbis work on C5x too.

- An ARM uC achieves better (faster) results than the C5x. We should not
forget that the ARM is a 32bit micro processor. As the most frequently
used multiply operations I have seen in the sources are 32bit x 32bit =
64bit>>32 = 32bit and the C5x DSP only supports 16bit x 16bit =
32bit>>16 = 16bit, that means, that the ARM is expected to be 4 times
(400%) faster (compared to C5x assembler code). You only saw a
performance gain of 12%. 

- TI C5x compiler is not able to implement a 32bit x 32bit multiply in 4
cycles which is able in assembler. On the other side the ARM compiler is
probably not able to do a 64bit by 64bit multiply without a call to
stdlib, which is the same considering the bit depth. 

- TI C5x is currently the only 16bit CPU which is able to decode Ogg

I am pretty sure you will tell me, if I had forgotten anything. :-)

-----Ursprüngliche Nachricht-----
Von: 'Monty' [mailto:xiphmont at] 
Gesendet: Mittwoch, 20. Oktober 2004 04:25
An: Roland Wintersteller
Betreff: Re: [Tremor] TI55xx implementation: stuck

On Tue, Oct 19, 2004 at 08:51:49PM +0200, Roland Wintersteller wrote:
> Hi Monty, 
> Obviously you have a problem with TI DSPs and the corresponding tool
> chain. I just want to state, that this is at least not the impression
> everyone in this mailing list. 

> I worked with TI DSPs more than a year and it's even hard for me to
> understand your arguments.

Actually, I have no problems with the chips themselves.  I have
serious issues with the toolchain.  And since there are no competing
toolchains to use with the 54/55xx TI DSPs, these inadequacies make it
more difficult than it has to be to work with otherwise fine hardware.

I have worked with two companies now that originally chose TI over ARM
based on cost-per-mip.  Both eventually regretted getting locked into
the TI processors after dealing with continuous toolchain headaches.
The poor compiler would be tolerable if it and the the assembler and
debugger were at least reliable... which they're not.  These companies
regularly spent days of downtime with the entire engineering team just
trying to figure out why the most recent builds were causing the
debugger/assembler/compiler to crash anytime they tried to launch.

'm not talking about code bugs crashing the remote device.  I'm
talking about not even being able to launch the IDE after a code merge
and TI not being able to tell us why.

That's unacceptable for a toolchain coasting over $5k a seat.

> Furthermore the compiler itself does nothing
> else than translating the c-code into assembler.

...which it does astoundingly poorly.  This process has possibly
gotten more practical computer science study than any other. It's not
that easy :-)

On the 5416, my hand-coded (and not very well coded I might add)
assembly iMDCT outperformed the compiled C version by a factor of 25x.

On the ARM, GCC beat my hand coded version by 12%.

> The code is only
> optimized, if the optimizer is enabled.

It was enabled.

> I had a look on the efficiency
> of the assembler output in a number of cases. If the optimizer is on
> or g3) and the Debug Info is disabled, in the most cases I couldn't
> a reasonable better result writing this part of code in assembler.

...until you have to use type widths/operations that the compiler
claims to support, but actually implements as function calls into a
pseudo-intrinsics library.  Or actually supports, but has a poor state
convention so that there's setup and teardown around every op.  And
there's no assembly inlining.  

The whole point of having a compiler is to save the programmer work.
16x16->32 bit multiply is not an esoteric operation, even on the 5416
but the compiler still did a worse job than a newbie programmer who'd
been working with the chip for only two weeks at the time (me).

This is where the compiler exposes itself as a joke; it's barely
little more than a C parser that spits out line-by-line assembly
translation and an optimizer that then tries to make a small handful
of optimizations to that front-end's output.  TI would have done
better to pay Cygnus the going rate for a GCC backend.  Compilers are
not easy and companies that claim otherwise are deluding themselves.

> I do not want to criticise you, especially not because I think you've
> done really a great job by implementing Vorbis and Tremor. But I also
> think that it is not fair to compare a 16bit-DSP with a
> 32bit-Microcontroller (which has even more an assembler instruction
> especially developed for efficient c-code translation). 

Agreed; the hardware is very different and I didn't mean to cast
aspersions on the 5416 itself (although the ALU is *very* weird :-).
My complaints with TI are entirely the developer toolchain.  It is in
fact the case that TI, for the most part, does not use this toolchain
in-house.  TI codes in assembly internally (TI did not write CCS; it
was outsourced), and I will freely admit that the 54xx and 55xx can
probably really sing in the hands of a good assembly programmer
experienced on the architecture.

At Neuros, there was one such engineer on staff and he really did do
wonders with the chip and generally got along with the
toolchain. OTOH, it was the only architecture he'd ever used and he
generally worked around the compiler too.

(FWIW, his name was Michael Gao and if you're interested in a TI guru,
I have nothing but good things to say about him.  He was fantastic.)

> Let us compare
> the C5416 DSP with 16-bit CPUs of other chip producers. I have not
> from Ogg Vorbis running on another 16bit CPU. Is there any port
> available?

It runs on the 54xx only because I was paid to port it :0) Had that
not been the case, it would not exist for this architecture either.

(It exists for ARM also because I originally wrote that port.  The ARM
port was running in under two weeks.  The 54xx port took more than six
months and the ARM port still outperfors it)

> I also had similar problems that CCS does not respond after running a
> program.

Losing sync with the emu box is not whet I'm talking about here;
having the device go non-responsive will happen in any remote stub
environment, at least, on every one I've worked on in the past 15

The bug I'm talking about never got as far as actually running.  My
bug was, double-click CCS, load the project and you get a 'Fatal
Exception' dialog before it even begins downloading to the device.  It
did that on multiple machines, and it did it on fresh installs.  This
bit even Michael regularly, but Mike was the kind of guy who would
just quietly find some way to make it work :-)

> Furthermore I think that CCS has
> included a number of very nice and helpful features. When you use
> DSP/BIOS you can display the current CPU-Load and tools are provided
> measure execution times of functions and modules. Additionally your
> application is not limited to one task anymore - multitasking
> applications are possible. 

All these features are standard issue in any worthwhile toolchain.


More information about the Tremor mailing list