[Tremor] Notes on Implementing Tremor on an ARM7TDMI CPU

Nicholas Vinen hb at x256.org
Sat Dec 6 16:27:27 PST 2008


Timmy Brolin wrote:
>> According to WikiPedia, the Nintendo DS has the same ARM7TDMI core as
>> what I am using. I just overclocked mine to very close to 66MHz and so
>> far it seems to be working fine. In fact, from what I can tell, aside
>> from your CPU likely reading code out of RAM rather than flash, they
>> seem identical. No cache, same core, etc. So, it may be that executing
>> out of RAM makes all the difference.
>>   
>>     
> Note that the Nintendo DS has two processors. One ARM9 and one ARM7TDMI.
>   
Ah, OK, interesting.
>> So far this CPU seems stable @ 66MHz (w/ flash @ 33MHz). I may be able
>> to push it more. I'd rather not rely on that if possible, though.
>>
>> I really think that if it's 75% real time, I can get it to be faster
>> than real time, but without a better idea of which routines are the most
>> critical I agree it's going to be tough. I'll try to get profiling
>> working again.
>>
>>   
>>     
> Since your flash memory is single cycle for thumb, and double cycle for
> ARM, I would suggest you put your ARM code in RAM and keep the thumb
> code in flash. That is the typical arrangement on the Nintendo gameboy
> advance which has a 16bit flash, and 32bit single cycle RAM.
> The ARM assembly optimized routines should get a nice performance boost
> if you move them from flash to single cycle RAM.
>
> Timmy Brolin
>   
You are *spot on* with this comment. The final tweak I made to the code,
which gave a massive performance boost, was to put the following
functions in RAM by moving them into the .data section:

decode_packed_entry_number
decode_map
vorbis_book_decodevv_add
_checksum
mdct_backwards
mdct_shift_right
mdct_unroll_*

This cost me about 4K of RAM, which is an acceptable amount considering
the 30-40% reduction in cycles this gives.

How did I know to move these? Well, I'm using Tremolo now (i.e. version
of Tremor with more ARM assembly) and the author - Robin Watts - very
nicely provided a profile of the code. It shows that these functions
combined account for something like 75% of CPU time - at least in this
version of the code running on an ARM processor.

It's now using 87.5% of CPU to decode a 44.1kHz 16 bit stereo file with
the processor running at stock speed :)

I'd like to improve on this a bit, to increase the chances of being able
to do "seamless playback" which requires that I can close a file, open
the next one and decode the first packet before the audio buffer runs
out. I don't feel bad overclocking the CPU a bit anyway since I built
the power supply and know that it's going to provide better than the
minimum voltage requirements, but the more cycles I can shave off the
code by fiddling with it the less I have to push it.


Thanks to all for your help. I still have a lot of tweaking to do but
when it's all working nicely I'll upload a tar.gz somewhere and post a
message here so that anybody who wants to run Tremor on this processor
can get a head start. It's quite a nice little chip, just enough power
and memory to do Ogg Vorbis decoding at common bit rates/sample rates
combined with low power usage and very useful peripherals - e.g. SPI
w/DMA for interfacing with MMC/SD cards and I2S w/DMA for interfacing
with a DAC.


Nicholas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/tremor/attachments/20081207/9ade7115/attachment.htm 


More information about the Tremor mailing list