[vorbis-dev] Optimisations

David Riley oscar at the-rileys.net
Thu Nov 16 20:47:55 PST 2000



Segher Boessenkool wrote:
> 
> much effort to program, while the compiler will probably outsmart about
> every asm
> programmer (if enough work is put into the compiler).

I dunno, I've seen some pretty inefficient compiler mistakes on
commercial compilers (namely Metrowerks).  I've cut the processor cycle
count by about (not to about) 1/3 by hand coding assembly in Metrowerks
simply because it doesn't handle sequential fused multiply adds.  An example:

//snip...

matrix3[0][0] = matrix1[0][0] * matrix2[0][0];
matrix3[0][0] += matrix1[0][1] * matrix2[1][0];
matrix3[0][0] += matrix1[0][2] * matrix2[2][0];
matrix3[0][0] += matrix1[0][3] * matrix2[3][0];

//snip...

This is the assembly code that the compiler produced at full
optimization for this particular block.

lfd	fp1,0(r3)
lfd	fp0,0(r4)
fmul	fp0,fp1,fp0
stfd	fp0,0(r5)
lfd	fp2,8(r3)
lfd	fp1,32(r4)
lfd	fp0,0(r5)
fmadd	fp0,fp2,fp1,fp0
stfd	fp0,0(r5)
lfd	fp2,16(r3)
lfd	fp1,64(r4)
lfd	fp0,0(r5)
fmadd	fp0,fp2,fp1,fp0
stfd	fp0,0(r5)
lfd	fp2,24(r3)
lfd	fp1,96(r4)
lfd	fp0,0(r5)
fmadd	fp0,fp2,fp1,fp0
stfd	fp0,0(r5)

As you can see, the destination matrix element (fp0) is needlessly
stored and loaded right back again. Here is the code I wrote.

lfd	fp2, 0(r3)
lfd	fp1, 0(r4)
fmul	fp0, fp2, fp1
lfd	fp2, 8(r3)
lfd	fp1, 32(r4)
fmadd	fp0, fp2, fp1, fp0
lfd	fp2, 16(r3)
lfd	fp1, 64(r4)
fmadd	fp0, fp2, fp1, fp0
lfd	fp2, 24(r3)
lfd	fp1, 96(r4)
fmadd	fp0, fp2, fp1, fp0
stfd	fp0, 0(r5)

As you can see, this takes up half the space. This doesn't entirely
equate to half the time, since FP multiplies take about 5 cycles and
load/stores typically take three. But profiling, it ran about 1/3
faster, which would be a major
improvement with a large number of calculations.

This doesn't entirely relate to sound (this is actually for
transformation of 3D objects), but the sequential multiply-adds are
certainly a common thread.  It's a rather poor mistake to make, and I'm
sure gcc does much better.  I haven't seen how MPW handles it, but then
I could never figure out MPW makefiles enough to get an assembly dump
out of them.

Hope this offered some help.

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis-dev mailing list