[Vorbis-dev] mdct.c optimization

petshome at atlas.cz petshome at atlas.cz
Tue Feb 1 08:46:55 PST 2005


I took function mdct_butterfly_8 and write out transformation matrix. Then I rewrote this matrix into sequence of additions and substractions (see attachement). As I suspected I got the same as in the original code but I swaped some rows to get little higher speed. I hope I'll do the same with 16 point butterfly function combined with 8 point butterflies in a month. Who still believe that this approach will be slower?
-------------- next part --------------
Itemize transcription of linear algebra transformation into optimized CPU operations because we know the transformation matrix.

This is the data vector with auxiliary variables A and B which should be registers.

[x[0],x[1],x[2],x[3],x[4],x[5],x[6],x[7],A,B]

Here are the transorm matrixes with substituted operations and outputs.

 0  1  0 -1 -1  0  1  0
-1  0  1  0  0 -1  0  1
-1  0 -1  0  1  0  1  0
 0 -1  0 -1  0  1  0  1
 0 -1  0  1 -1  0  1  0
 1  0 -1  0  0 -1  0  1
 1  0  1  0  1  0  1  0
 0  1  0  1  0  1  0  1

A=x[1]-x[5]

 0  1  0 -1 -1  0  1  0
 0  0  0  0  0 -1  0  1
-1  0 -1  0  1  0  1  0
 0 -1  0 -1  0  1  0  1
 0 -1  0  1 -1  0  1  0
 0  0  0  0  0 -1  0  1
 1  0  1  0  1  0  1  0
 0  1  0  1  0  1  0  1
-1     1

B=x[6]-x[2]

 0  1  0 -1 -1  0  1  0
 0  0  0  0  0 -1  0  1
 0  0  0  0  1  0  1  0
 0 -1  0 -1  0  1  0  1
 0 -1  0  1 -1  0  1  0
 0  0  0  0  0 -1  0  1
 0  0  0  0  1  0  1  0
 0  1  0  1  0  1  0  1
-1     1
 1     1

y[0]=B-A
y[2]=A+B
A=x[0]-x[4]

 0  0  0  0 -1  0  1  0
 0  0  0  0  0 -1  0  1
 0  0  0  0  1  0  1  0
 0 -1  0 -1  0  1  0  1
 0  0  0  0 -1  0  1  0
 0  0  0  0  0 -1  0  1
 0  0  0  0  1  0  1  0
 0  1  0  1  0  1  0  1
    1    -1

B=x[7]-x[3]

 0  0  0  0 -1  0  1  0
 0  0  0  0  0 -1  0  1
 0  0  0  0  1  0  1  0
 0  0  0  0  0  1  0  1
 0  0  0  0 -1  0  1  0
 0  0  0  0  0 -1  0  1
 0  0  0  0  1  0  1  0
 0  0  0  0  0  1  0  1
    1    -1
    1     1

y[1]=A+B
y[3]=B-A
A=x[0]+x[4]

 0  0  0  0  0  0  0  0
 0  0  0  0  0 -1  0  1
 0  0  0  0  1  0  1  0
 0  0  0  0  0  1  0  1
 0  0  0  0  0  0  0  0
 0  0  0  0  0 -1  0  1
 0  0  0  0  1  0  1  0
 0  0  0  0  0  1  0  1
            -1     1

B=x[2]+x[6]

 0  0  0  0  0  0  0  0
 0  0  0  0  0 -1  0  1
 0  0  0  0  0  0  0  0
 0  0  0  0  0  1  0  1
 0  0  0  0  0  0  0  0
 0  0  0  0  0 -1  0  1
 0  0  0  0  0  0  0  0
 0  0  0  0  0  1  0  1
            -1     1
             1     1

y[4]=B-A
y[6]=A+B
A=x[1]+x[5]

 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  1  0  1
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  1  0  1
               -1     1

B=x[3]+x[7]

 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
               -1     1
                1     1

y[5]=B-A
y[7]=A+B


I hope I didn't make a mistake. First I wrote a code for minimum number of add/sub operations. Below you can see second rewrote which minimize memory space including coupling data inputs. I hope this is little improvement on some CPU.
But this is just with 8x8 matrix. When I'll have more spare time I'll itemize 16x16 transform matrix including two 8x8 trans. mat. which will lead to significant increase of speed. I welcome all volunteers which will write transform matrix, do the multiplication and itemize it into CPU intructions with memory optimization. I more welcome those which will (write C code and) benchmark this little improvement and let all know (better wait for 16x16 transform). The most welcome are programmers which will write C code for automatic itemization and optimization of this transform of arbitrary size. This includes matrix multiplication, subtitution of the most occured operations, memory optimization and arbitrary input coupling.

G=x[0]-x[4]
A=x[0]+x[4]
F=x[6]-x[2]
B=x[6]+x[2]
x[6]=B+A
x[4]=B-A
A=x[1]-x[5]
B=x[1]+x[5]
x[2]=F+A
x[0]=F-A
A=x[7]+x[3]
F=x[7]-x[3]
x[7]=A+B
x[5]=A-B
x[3]=F-G
x[1]=F+G


More information about the Vorbis-dev mailing list