[vorbis-dev] Optimisations
Segher Boessenkool
segher at wanadoo.nl
Fri Nov 17 06:17:32 PST 2000
> I dunno, I've seen some pretty inefficient compiler mistakes on
> commercial compilers (namely Metrowerks). I've cut the processor cycle
> count by about (not to about) 1/3 by hand coding assembly in Metrowerks
> simply because it doesn't handle sequential fused multiply adds. An example:
>
> //snip...
>
> matrix3[0][0] = matrix1[0][0] * matrix2[0][0];
> matrix3[0][0] += matrix1[0][1] * matrix2[1][0];
> matrix3[0][0] += matrix1[0][2] * matrix2[2][0];
> matrix3[0][0] += matrix1[0][3] * matrix2[3][0];
Write it as one big addition, the way you write it requires
the compiler to have the temporary results in precision exactly
the precision of double, and normalized as well. The floating
point internal registers are normally bigger than double (don't
know for sure for PPC, though), and a fused multiply-add doesn't
normalize in between the mul and the add, 'cause it does them as
one operation. So you really require the compiler to spill the
data to memory. Alternatively, you can first put the result in
a "register double temp".
This is a common problem; most compilers have options to not
require this kind of strictness, such that your code will make
efficient assembly as well (for example, gcc has
-ffast-math, and -mno-ieeefp (that last one is for sin(), sqrt(),
etc.; maybe this one is x86 only).
If given proper hints, any reasonable compiler will make good asm
from your code. The situation on x86 is much worse, because of the
weird floating point stack, and only single operand insns. This
gives you only 6 or 7 registers to use, plus a big scheduling
problem. You have to break the code into blocks by hand (by putting
braces around it, and declaring new temporaries inside), to avoid
the compiler spilling everything into the stack. MSC compiler does
a little better than gcc here, but their code really is only good
for Pentium, not for PentiumPro/II/III,K5/K6/K7 etc. Efficient
floating point on x86 requires programmer magic |-(
[...snip...]
> This doesn't entirely relate to sound (this is actually for
> transformation of 3D objects), but the sequential multiply-adds are
> certainly a common thread. It's a rather poor mistake to make, and I'm
> sure gcc does much better. I haven't seen how MPW handles it, but then
Only if you tell it to break the C standard :-)
> I could never figure out MPW makefiles enough to get an assembly dump
> out of them.
There are some books about MPW; the downloadable documentation is quite
good as well.
Ciao,
Segher
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis-dev
mailing list