[Tremor] Tremor ARM performance issues
hb at x256.org
Fri Dec 5 03:06:00 PST 2008
Now that I nearly have my ARM-based test board ready to load Tremor onto
it, I've realized there is a performance issue.
The CPU (AT91SAM7S) supports single cycle flash access up to 30MHz,
while it can run up to 55MHz. I have read you can push it to 48MHz if
you can keep the temperature low. I probably can, or at least get
somewhere close to that, if necessary.
However, assuming for the moment I'm going to stay within the
specifications, this means I have the following choices:
1) Run at 30MHz, no thumb code, text read (flash) is single cycle access
and so is RAM.
2) Run at 55MHz, no thumb code, text read (flash) is two cycles to
access, RAM is single cycle.
3) Run at 55MHz, use thumb code, text read (flash) is one cycle (since
it's only 16 bits), RAM is single cycle.
However, the trick is that the ARM assembly code which is part of Tremor
isn't going to work in Thumb mode. So, I'd have to compile the files
mdct.c, floor0.c and floor1.c without Thumb, and the rest with it.
So my basic question is this: is the performance benefit of the assembly
code worth the penalty of the extra cycle per instruction for any
function which uses it? I have a feeling it isn't. If I avoid using the
assembly, and thus can compile everything in thumb mode, this also
avoids some annoying library issues (the C library doesn't seem to
support being called from both Thumb and regular ARM mode).
I'll probably end up testing performance with the CPU running as high as
I can get it to reliably work with no flash wait states (hopefully at
least 40MHz) versus running at the 55MHz (probably higher) it is capable
of with the extra wait state.
I read somewhere that the ARM code can successfully decode Vorbis files
in real time on an ARM at 30MHz, so I haven't ruled out that
possibility, however there are two reasons I would like to avoid that.
Firstly I'd like some time to do other things too such as update the
LCD. Secondly I would ideally like to support seamless playback, which
means that the decoder has to be able to open and start decoding the
next file before the audio buffer runs out. The faster I can get Tremor
to run, the more realistic that becomes. I wish I could remember who
said that but I can't find the message in the archive any more.
If anybody has experience with this kind of issue of running Tremor on
an ARM system I'd be very happy to receive some advice.
More information about the Tremor