[Tremor] Notes on Implementing Tremor on an ARM7TDMI CPU
segher at kernel.crashing.org
Sat Dec 6 10:03:58 PST 2008
> Why is this amazing? Well, because the flash on this CPU runs at a
> maximum of 30MHz. That means at 55MHz core speed, it takes two
> cycles to
> read 32 bits from flash. In thumb mode, instructions are 16 bit, so if
> there are no branches, it can execute one thumb instruction per cycle.
> In non-thumb mode, instructions are 32 bit, so it can only execute one
> instruction every two cycles. So I would have thought thumb mode would
> improve performance, due to the greater instruction throughput. I
> not, though.
ARM instructions do more work per instruction than Thumb insns; and
they can access more registers more freely. Thumb is also harder for
GCC to generate good code for. Thumb2 is better, but you don't have
> I am at a loss to undetstand why Segher thinks a 40MHz ARM should be
> fast enough to play back an Ogg Vorbis file.
That was an estimation, as should be obvious. Your 55MHz device with
slow memory can almost do it, so it was a pretty good estimation if I
say so myself :-)
> Is an ARM4 faster than an
> ARM7 clock-per-clock? (I wouldn't have thought so).
ARMv4, as others have commented already. I think I had ARM7 in mind,
not ARM9, since I didn't use ARM9 devices often back then. But it's
an old post, I don't remember the details :-)
> I have several options to achieve the required level of performance. I
> would love some feedback on the best options.
> 1) Compile more files without thumb. I will try this to see what
This probably only helps for the computationally heavy routines.
> 2) Use _LOW_PRECISION_. I don't want to lose audio quality but I
> need to
> get this to run in real time!
Try it out and see how bad it really is.
> 3) Overclock the CPU and/or the flash.
Bad plan. Is this an external flash though? You should be able to
get faster flash than that 33MHz.
> 4) Load some tables into RAM. RAM is very tight, but it may be
> Can someone point to which table(s) would have the most benefit? The
> sine table? I guess I'll try them and see.
The FFT/MDCT twiddles and window are a good place to start.
> 5) Implement the FFT replacement for the MDCT mentioned earlier. This
> could be fruitful, but will not be trivial.
> 6) More than one of the above in combination.
> 7) Anything else?
Measure. You cannot solve a performance problem (or any other problem)
if you don't know what the problem _is_.
You really should get a development board with external RAM, so you can
run bigger code during development than you would for deployment, and
so your turnaround time is a few seconds instead of 20 minutes. You do
need to experiment a lot to get the best performance (or, very good
More information about the Tremor