[vorbis-dev] Optimisations
Segher Boessenkool
segher at wanadoo.nl
Thu Nov 16 15:45:07 PST 2000
> #ifdefs would work if there are bugs in the compilers. I say 'bugs' since the C extensions for Altivec are defined by Motorola and should be the same across all compilers. I have used MrC (a bit) and gcc and both are the same. I haven't used the MW compiler for Altivec, though.
This sounds great! Are these extensions weel-thought out? Where can I
get-em? I'll look at mot.com, of course...
> >MPW (MrC) does a great job (yeah, I did only one test, sorry). Btw, is there
> >a fused multiply-add in AltiVec? That would make it an absolute ROCKER!
>
> Altivec has lots of cool instructions:
>
> vmaddfp -- result = a*b + c
> vnmsubfp -- result = - (a*b - c) = c - a*b
> vrepe -- result =~ 1/a
> vrsqrte -- result =~ 1/sqrt(a)
Are these table looked up, to about 11-12 bits precision, like in 3dnow?
Is there an instruction
to do a newton iteration on it, to have full 23-24 bit results?
Mmmh, maybe I should go look up the full insn set.
> vperm -- result = a|b permuted by c
yeah, I knew about this one. This one alone makes me think AltiVec is world-class.
> vexpte -- result =~ 2^a
> vloge -- result =~ log2(a)
Are these reasonably good approx. even if a is float? Wow.
> vctf -- result = 2^n * (float)i (although, sadly, n >= 0)
>
> I have used all of these to great effect in other apps. Some of the estimate instructions are _very_ useful when you don't need IEEE exact results. Even when you do need really accurate results you can often find a refinement algorithm that will produce better results given a good starting estimate and still be way faster than a libm call (like Newton-Rhapson refinement for 1/sqrt as show on page 4-18 of the Altivec PEM).
>
> >If I understand correctly, the gcc extensions consist mainly of new datatypes
> >(like, floats4 or whatever they call it), such that
>
> 'vector float', 'vector unsigned long', 'vector bool', 'vector unsigned char', etc
So these will presumably still work when there will fit more then 4
floats in a reg?
How do they do this? Or is it fixed at 4? In that case, vecor is a
mis-nomer, should be vector4 OSLT.
> >floats4 a, b, c;
> >c = a + b;
>
> vector float a, b, c;
>
> // vec_add is a polymorphic function that will select the right instruction based on the arguments and result type
> c = vec_add(a, b);
> Probably, but it will probably be hard for the compiler to do some optimizations. For example, if your C code has needless conversions back and forth between ints and floats, the compiler really doesn't know whether you meant to loose precision or whether you are just being silly. If you take it down to the C Altivec bindings then you get some of the best of both worlds.
According to ANSI C, you want to loose precision.
> You know the general instruction flow and can see where you have a lot of instructions. But, you don't have to worry about exact instruction selection, register assignment (which can be a real bear when you have 32 floats, 32 ints and 32 vectors to worry about), or instruction ordering for pipelining.
Yep. And your code will still be (reasonably) good for future
processors/processor revisions.
Dagdag,
Segher
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis-dev
mailing list