[vorbis-dev] faster mdct's

Michael Smith msmith at xiph.org
Mon Jun 2 02:42:34 PDT 2003



On Sunday 01 June 2003 13:43, Steven G. Johnson wrote:
> Hello Vorbis folks,
>
> I'm one of the FFTW authors (www.fftw.org), and a few days ago I was
> playing with our codelet generator for fun and modified it to spit out
> hard-coded MDCTs of small sizes.  The code (at
> jdj.mit.edu/~stevenj/mdct_128nr.c) for 256 samples (128 outputs) seems to
> be almost twice as fast as the Vorbis MDCT code for that size on my 2.2GHz
> P-IV (gcc 3.2.2 and flags "-O1 -mcpu=pentium4 -fomit-frame-pointer
> -fstrict-aliasing -malign-double"), in single precision.
>
> I guess the other size that is important to you is 2048 samples; for that
> size you definitely don't want a hard-coded routine (the linked 256-sample
> codelet is just inside the crossover-point for hard-coding, in our
> experience).  There, you probably want a recursive/looping algorithm,
> albeit probably only two stages (e.g. one radix-32 step).  FFTW 3 contains
> code for a DCT-IV of arbitrary sizes (it works by pre/post-processing a
> real-input FFT of half the size) that can be fairly trivially adapted to
> the MDCT (which is just a DCT-IV with some input aliasing). I tried that
> this evening (see attached mdct.[ch]), but it's only about 30% faster than
> the Vorbis MDCT for 2048 samples, although the advantage increases for
> larger sizes (e.g. 60% faster for 128k samples).  It could be made
> substantially more efficient by generating special purpose codelets to
> avoid separate pre/post-processing passes...we know our DCT-IV code is not
> optimal.  (It also doesn't use SIMD.)
>
> The above two codes compute an unwindowed MDCT, and give the same results
> as the one in your mdct.c, but I can also easily make one with a window
> function built in (to avoid the separate pass/loads).
>
> I'm not sure how much you care about MDCT performance (what fraction of
> CPU time is it?), but I thought you might find this interesting anyway.

Steven,

Since nobody else has answered you, I thought I should say something.

MDCT performance is insignificant on encode, but takes a substantial (not sure 
what percentage, but it's non-trivial) amount of time on decode. However, 
decode is not really a performance problem on 'desktop-class' cpus - it only 
really matters a lot for embedded use (either things like consoles - some 
people are quite interested in increasing performance on the PS2, for example 
- or portable players, though in the latter case floating point hardware is 
an unheard-of luxury, so this isn't directly relevent here).

We can't use any of your code as it stands anyway (at least the one you 
attached, I haven't checked FFTW generally) because it's GPLed. Obvioulsy 
relicensing FFTW to xiph.org's bsd-like license isn't an option, but I'm not 
sure about what the licensing status of the generated code (as, for example, 
the code you gave a link to: jdj.mit.edu/~stevenj/mdct_128nr.c) is. Would it 
be usable for us?

Mike

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis-dev mailing list