[Vorbis-dev] Low level optimization

dean gaudet dean-list-vorbis-dev at arctic.org
Fri Feb 11 14:39:17 PST 2005


On Fri, 11 Feb 2005, Tuomo Latto wrote:

> I think the keyword here is x86 (or actually "pentium4" in that url path).
> And furthermore, even if the compiler supports it, the support won't
> magically appear onto one's CPU...

actually gcc has altivec intrinsics as well -- and sse1/2 are a subset of 
what altivec can do (i don't remember if altivec has the mixed addsub 
operations present in sse3, but they're emulated easily enough)... 
unfortunately the two sets of intrinsics differ, so there'd need to be a 
wrapper layer.  but i think it would be educational to hand vectorize some 
of the code and write the wrapper just to see what happens.

unfortunately vector stuff is still pretty new in the commonly used 
compilers ... in theory it'd be possible for the compiler to hide more of 
the details, maybe some day that'll happen.  there are so many operators 
on vectors (especially the data shuffling/repacking types of operations) 
that it makes optimisation challenging for the compiler.  which is why it 
tends to only happen in the "SPEC" compilers -- such as intel's, whose 
main purpose in life is to make intel processors show incredible 
speccpu2000 scores.


> Nasm is quite nice, yes.
> I also happens to be yet another build requirement.
> What I like about (Xiph's) current Ogg/Vorbis stuff is that it
> specifically _does_not_ require a lot of other stuff. You can build
> it OOTB and not worry about getting a bunch of requirements and
> dependencies first.

oh i agree, but there's no way the default C code would need to go away, 
and there's nothing stopping an autoconf test to see if specific tools are 
available.  or at worst case there's nothing stopping someone maintaining 
their own patches / fork of vorbis.

i understand the motivation to have a clean portable tree for the 
reference implementation... but if it's modular enough then it should be 
possible to plug in improvements without having to litter #ifdefs all over 
the place.

there have been some C-specific optimizations posted here which i don't 
think made it into the distribution.  i posted one which assumed IEEE 
float format to optimise the inner sort function using integers instead of 
floats (because comparing fabs(x), fabs(y) can be done by loading the 
values as 32-bit ints, doubling them to remove the sign bit, and 
comparing... i forget the function name).  there was another patch which 
added some padding to some arrays which were all hot in the L1 cache -- 
but which had cache line aliasing problems on machines with low 
associativity or small L1s.

anyhow both of those were portable enough, but i can see why such stuff 
isn't desirable in a reference codec.

i admit i have the "let's see how fast we can make it go" disease :)  i'd 
certainly like to see a common set of patches with the known optimizations 
in them, i bet some distributions would include those patches when 
building their binaries.

-dean


More information about the Vorbis-dev mailing list