[vorbis-dev] Optimisations

Timothy J. Wood tjw at omnigroup.com
Thu Nov 16 16:44:37 PST 2000



>They say they will in the near future support 16 operations at once. I don't think
>they will be able to do four separate operations at once, so they most likely will
>widen the registers 

  That's sure be cool.

>Is just int->float bad, or float->int as well? i ask this, because I was pleasantly
>surprised today, because my G3 was like 10 times faster than my Athlon
>(and that one was _way_ faster than the P-III) in converting an array of float to an
>array of int (in plain stupid C code).

// cc -O3 -S -static float.c

int floatToInt(float f);
float intToFloat(int i);

int main()
{
    floatToInt(1.0);
    intToFloat(1);
}

int floatToInt(float f)
{
    return (int)f;
}

float intToFloat(int i)
{
    return (float)i;
}

  This produces the following assembly for the two functions:

_floatToInt:
        fctiwz f0,f1
        stfd f0,-8(r1)
        lwz r3,-4(r1)
        blr

        .double 0r4.50360177485414400000e15
.text
        .align 2
.globl _intToFloat
_intToFloat:
        lis r0,0x4330
        lis r9,ha16(LC0)
        la r9,lo16(LC0)(r9)
        lfd f0,0(r9)
        xoris r11,r3,0x8000
        stw r11,-4(r1)
        stw r0,-8(r1)
        lfd f1,-8(r1)
        fsub f1,f1,f0
        frsp f1,f1
        blr

  As you can see float->int isn't too bad.  If you need the results in a register, you are wasting two memory operations due the fact that RISC machines don't move data between functional units usually.  On the other hand, int->float is abominable.  The case shown above makes it look a bit worse than it has to be since a bunch of the operations can be hoisted outside any potential loop (loading the address of the contant and initializing the first word of the double temporary on the stack).  Sadly, even in a loop, gcc doesn't hoist the first store outside the loop so you get three memory operations plus two float operations per loop instead of two memory ops and two float ops.

  This is one nice thing about Altivec -- it has a very fast path for both int->float and float->int.

>More generally, maybe all of the audience can help: what are the weakest points of all the
>various processors Vorbis will be deployed on?

  Speaking from my experience trying to optimize Quake3 for Mac OS X, I find:

- Memory bandwidth
- Int->Float conversion

  to be the two worst problems on the PPC.  Memory bandwidth probably isn't as big of an issue for Vorbis as for Quake, but it might still have some effect for lookup tables that don't fit in cache.  This effect can be made less bad by using the data cache touch instructions when possible.

  The int->float conversion problems go away if you can use Altivec to do it (i.e., you have an array of ints and you need an array of floats and they are all in the right positions, etc.)

-tim

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis-dev mailing list