[tremor] Ogg/Vorbis report, FFT optimizations

Johannes Sandvall js at sandvall.nu
Thu Mar 11 22:21:58 PST 2004



On Fri 12 Mar 2004 03:46:49 CET, timmy brolin wrote:
> You talk about the 32bit lookup tables, and the performance hit the 32*32 bit multiplications causes (requires 3 multiplications and some additions on your 16bit DSP).
> Tremor is written specifically to take advantage of the 32*32=64bit multiplication instruction found in ARM processors (and similar 32bit RISC architectures). Naturally you get a big performance hit when compiling tremor for processors without 32*32bit multiplication instructions.
> 
Of course. But maybe not as big in optimized assembly.

> Secondly, as far as I understand, when you compare the performance of the old MDCT with your new FFT based implementation, you are comparing:
> A pure C implementation of a MDCT written specifically for processors with a 32*32bit multiplication instruction, but complied for a processor with only a 16*16bit multiplication instruction.
> With:
> A FFT based MDCT implementation where the FFT is highly asm optimised by Texas Instruments for this particular DSP.
> Not a very fair comparison.

It not fait but that was not the point. An FFT implementation has
shown to be faster on most platforms eventhough not 8 times was mainly
comes from assembly optimizations. 

The point was to get the block to execute as fast as possible and 8
MIPS for an MDCT of length 2048 is ok. 

> The obvious fact that the original Tremor codebase is neither suited nor intended for 16bit processors shows clearly in table 7.4. According to that table, Tremor require 107MIPS to decode at 112kbps on your DSP. Tremor runs fine on a ~60MIPS ARM7.

Another problem is that the DSP is missing 8-bit data access. The main
reason some parts of the code is running absurdly slow. For the highly
optimized parts (FFT) if think the ARM is outperformed by the DSP
simply by the fact of 2 dual MAC:s and the ability to to run up to 13
RISC instructions in a single cycle.

Our version needs 44 MIPS to decode at 112k and thats mainly because
of unoptimized assebly. It should not be a problem I think to optimize
that block to a total under 20 MIPS.

 > In your paper, you suggest a system consisting of a ARM processor for parsing and pre-decoding, and a DSP (presumably the 16bit TI DSP you use) for decoding. This sounds rather silly to me, since Tremor is specifically designed to run on 32bit ARM CPUs.

Acctually not. But I not going into detail here.

A DSP implementation is highly instresting for embedded
products. Specially for low power systems. A 5510 offers a much higher
ratio between MIPS an power consumption.

Regards 

/ Johannes Sandvall

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'tremor-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Tremor mailing list