[vorbis-dev] Optimisations

Thu Nov 16 15:30:18 PST 2000

> > Why not in assembly?  The GCC extensions won't necessarily work across 
> > platforms (i.e. with the Metrowerks compiler) while it's already 
> > accepted that assembly doesn't... And (to my mind) it's easier to 
> > separate two similar assembly files than C files.  Besides, most PPC 
>  
> Just use some #ifdef's, no big deal. Or two separate src files, you'll need 
> them for asm as well. 

  #ifdefs would work if there are bugs in the compilers.  I say 'bugs' since the C extensions for Altivec are defined by Motorola and should be the same across all compilers.  I have used MrC (a bit) and gcc and both are the same.  I haven't used the MW compiler for Altivec, though.

>MPW (MrC) does a great job (yeah, I did only one test, sorry). Btw, is there
>a fused multiply-add in AltiVec? That would make it an absolute ROCKER!

  Altivec has lots of cool instructions:

   vmaddfp   --    result = a*b + c
   vnmsubfp --    result = - (a*b - c) = c - a*b
   vrepe        --    result =~  1/a
   vrsqrte      --    result =~  1/sqrt(a)
   vperm       --    result =    a|b permuted by c
   vexpte      --    result =~ 2^a
   vloge        --    result =~ log2(a)
   vctf            --    result = 2^n * (float)i    (although, sadly, n >= 0)

  I have used all of these to great effect in other apps.  Some of the estimate instructions are _very_ useful when you don't need IEEE exact results.  Even when you do need really accurate results you can often find a refinement algorithm that will produce better results given a good starting estimate and still be way faster than a libm call (like Newton-Rhapson refinement for 1/sqrt as show on page 4-18 of the Altivec PEM).

>If I understand correctly, the gcc extensions consist mainly of new datatypes
>(like, floats4 or whatever they call it), such that

   'vector float', 'vector unsigned long', 'vector bool', 'vector unsigned char', etc

>floats4 a, b, c;
>c = a + b;

  vector float a, b, c;

  // vec_add is a polymorphic function that will select the right instruction based on the arguments and result type
  c = vec_add(a, b);

> will do a vector addition. This is a quite natural thing to do, and 
> doesn't take 
> much effort to program, while the compiler will probably outsmart about 
> every asm 
> programmer (if enough work is put into the compiler). 

  Probably, but it will probably be hard for the compiler to do some optimizations.  For example, if your C code has needless conversions back and forth between ints and floats, the compiler really doesn't know whether you meant to loose precision or whether you are just being silly.  If you take it down to the C Altivec bindings then you get some of the best of both worlds.  You know the general instruction flow and can see where you have a lot of instructions.  But, you don't have to worry about exact instruction selection, register assignment (which can be a real bear when you have 32 floats, 32 ints and 32 vectors to worry about), or instruction ordering for pipelining.

-tim

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.