[Vorbis-dev] mdct.c optimization
xiphmont at xiph.org
Tue Feb 1 12:29:42 PST 2005
On Tue, Feb 01, 2005 at 05:46:55PM +0100, petshome at atlas.cz wrote:
> I took function mdct_butterfly_8 and write out transformation
> matrix. Then I rewrote this matrix into sequence of additions and
> substractions (see attachement). As I suspected I got the same as in
> the original code but I swaped some rows to get little higher speed. I
> hope I'll do the same with 16 point butterfly function combined with 8
> point butterflies in a month. Who still believe that this approach
> will be slower?
Writing the original approach in specialized processor code will
produce a similar speedup; this is what libraries like FFTW do. I've
done so in the past; it's not in the reference encoder because it's a
bitch to maintain.
For small-point transforms, a generalized approach (like a
processor-specific matrix multiply) may end up being faster; n^2 is
faster than nlgn if n^2 has a very small n and a small coefficient and
nlgn has a large coefficient. I'm not sure this will generalize to
the other butterflies... but it might.
My skepticism mainly stems from the fact that people base entire
careers on making Fourier-domain transformations faster, and to my
knowledge, no one is using this approach. If you do prove them all
wrong, hey, we all gain. OTOH, inline MMX/SSE/Altivec vectorization
of the original approach may be faster-yet. I'd be interested in
seeing a comparison.
(BTW, I should alert folks right now: The specific MDCT transform that
Vorbis uses is not considered especially stellar. It was documented
and used primarily becuase it was easy to understand and easy to
implement. I'd hate for someone to spend six months writing an
assembly version that's 50% faster, when simply switching to another
algorithm would be 100% faster).
Regardless the outcome, it looks like you're having a good time so I
don't mean to discourage you. :-)
More information about the Vorbis-dev