[tremor] FFT N/4 imdct replacement
Christopher Montgomery
monty at xiph.org
Tue Nov 25 16:37:19 PST 2003
On Thu, Nov 20, 2003 at 03:33:29PM +0100, Johannes Sandvall wrote:
> We have developed a FFT N/4 IMDCT replacment targetted for the lowmem branch.
>
> The algorithm replaces the N/2 imdct with:
>
> Prerotation
> Complex N/4 FFT
> Postrotation
>
> Replaced unroll and mdct_shiftright
>
> There are two versions. One written i C not optimized, mostly used as
> reference code. This part can easily be optimized using a better
> FFT. The FFT using a simple 3 nested loops, last two stages unrolled,
> decimation in frequency. There must be better free FFTs somewhere.
The Vorbis encoder already uses the FFT from Netlib. We could try
that to start (although this FFT has an odd packing order).
> The other version is a fully optimized with keyparts written in TI 55x
> assembler, prerotation, postrotation, fft, unroll.
>
> All blocks use inplace algorithms. FFT-data and right using
> 32-bit. We've used 16-bit window constants, but it can easily be
> changed to 32-bit for higher precision. The FFT comes in two
> versions, with 16- or 32-bit twiddle factors.
>
> The error when using 16-bit twiddle factors and 16-bit window
> constants seems to be +-4LSB (3bit) giving a 13-bit correct
> output. Using 32-bit twiddle and 16-bit windowing constants gives a
> much smaller error, I think around +-2LSB.
> This also halves the memory needed for constants.
Most of the actualy DSP impelemntations (including the c5400 code I
wrote) already does this and the output is acceptible.
> The performance on DSP (TI 16-bit 5510) is quite impressing.
>
> Original IMDCT C code 72.9 MIPS.
>
> FFT replacment using assembler optimizations: 8.5 MIPS
That's a bit better than the c5400 version I wrote using the current
IMDCT (my assembly version being somewhat ugly). It will be
especially interesting if it uses even less memory.
> The drawback is that the out code only supports windows up to 4096. The
> reason for this is that we're using a modified TI-DSPLIB cfft32 with a
> 1024-long twiddle file. It should not be so hard modifying the code
> to support a window length of 8192.
Linear interpolation would be perfectly acceptible here.
> A patch for the lowmem-tree can be produced if anyone is interested.
Indeed, very interested.
> We've also been working on several other improvements, mostly
> concerning memory usage, and also a DSP/CPU(ARM) split
> version. Parsing is done in a CPU(ARM) reducing the memory usage on
> the DSP, especially since only a part of the codebooks are stored in
> on-chip RAM on the DSP at any given time (a "codebook cache").
>
> / Johannes Sandvall, Erik Montnemery
I'm naturally very interested in seeing this new MDCT. Feel free to
either forward it along to me or just post it to the list as an
attachment. The version against current low-memory would be the most
interesting to begin with (as well as being easiest to understand).
Monty
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'tremor-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Tremor
mailing list