[Tremor] Notes on Implementing Tremor on an ARM7TDMI CPU

Segher Boessenkool segher at kernel.crashing.org
Sat Dec 6 10:03:58 PST 2008


> Why is this amazing? Well, because the flash on this CPU runs at a
> maximum of 30MHz. That means at 55MHz core speed, it takes two  
> cycles to
> read 32 bits from flash. In thumb mode, instructions are 16 bit, so if
> there are no branches, it can execute one thumb instruction per cycle.
> In non-thumb mode, instructions are 32 bit, so it can only execute one
> instruction every two cycles. So I would have thought thumb mode would
> improve performance, due to the greater instruction throughput. I  
> guess
> not, though.

ARM instructions do more work per instruction than Thumb insns; and
they can access more registers more freely.  Thumb is also harder for
GCC to generate good code for.  Thumb2 is better, but you don't have
that.

> http://lists.xiph.org/pipermail/tremor/2003-January/000303.html
>
> I am at a loss to undetstand why Segher thinks a 40MHz ARM should be
> fast enough to play back an Ogg Vorbis file.

That was an estimation, as should be obvious.  Your 55MHz device with
slow memory can almost do it, so it was a pretty good estimation if I
say so myself :-)

> Is an ARM4 faster than an
> ARM7 clock-per-clock? (I wouldn't have thought so).

ARMv4, as others have commented already.  I think I had ARM7 in mind,
not ARM9, since I didn't use ARM9 devices often back then.  But it's
an old post, I don't remember the details :-)

> I have several options to achieve the required level of performance. I
> would love some feedback on the best options.
>
> 1) Compile more files without thumb. I will try this to see what  
> happens.

This probably only helps for the computationally heavy routines.

> 2) Use _LOW_PRECISION_. I don't want to lose audio quality but I  
> need to
> get this to run in real time!

Try it out and see how bad it really is.

> 3) Overclock the CPU and/or the flash.

Bad plan.  Is this an external flash though?  You should be able to
get faster flash than that 33MHz.

> 4) Load some tables into RAM. RAM is very tight, but it may be  
> possible.
> Can someone point to which table(s) would have the most benefit? The
> sine table? I guess I'll try them and see.

The FFT/MDCT twiddles and window are a good place to start.

> 5) Implement the FFT replacement for the MDCT mentioned earlier. This
> could be fruitful, but will not be trivial.
> 6) More than one of the above in combination.
> 7) Anything else?

Measure.  You cannot solve a performance problem (or any other problem)
if you don't know what the problem _is_.

You really should get a development board with external RAM, so you can
run bigger code during development than you would for deployment, and
so your turnaround time is a few seconds instead of 20 minutes.  You do
need to experiment a lot to get the best performance (or, very good
performance, anyway).

Good luck,


Segher



More information about the Tremor mailing list