[Vorbis-dev] Low level optimization
dean gaudet
dean-list-vorbis-dev at arctic.org
Fri Feb 11 14:39:17 PST 2005
On Fri, 11 Feb 2005, Tuomo Latto wrote:
> I think the keyword here is x86 (or actually "pentium4" in that url path).
> And furthermore, even if the compiler supports it, the support won't
> magically appear onto one's CPU...
actually gcc has altivec intrinsics as well -- and sse1/2 are a subset of
what altivec can do (i don't remember if altivec has the mixed addsub
operations present in sse3, but they're emulated easily enough)...
unfortunately the two sets of intrinsics differ, so there'd need to be a
wrapper layer. but i think it would be educational to hand vectorize some
of the code and write the wrapper just to see what happens.
unfortunately vector stuff is still pretty new in the commonly used
compilers ... in theory it'd be possible for the compiler to hide more of
the details, maybe some day that'll happen. there are so many operators
on vectors (especially the data shuffling/repacking types of operations)
that it makes optimisation challenging for the compiler. which is why it
tends to only happen in the "SPEC" compilers -- such as intel's, whose
main purpose in life is to make intel processors show incredible
speccpu2000 scores.
> Nasm is quite nice, yes.
> I also happens to be yet another build requirement.
> What I like about (Xiph's) current Ogg/Vorbis stuff is that it
> specifically _does_not_ require a lot of other stuff. You can build
> it OOTB and not worry about getting a bunch of requirements and
> dependencies first.
oh i agree, but there's no way the default C code would need to go away,
and there's nothing stopping an autoconf test to see if specific tools are
available. or at worst case there's nothing stopping someone maintaining
their own patches / fork of vorbis.
i understand the motivation to have a clean portable tree for the
reference implementation... but if it's modular enough then it should be
possible to plug in improvements without having to litter #ifdefs all over
the place.
there have been some C-specific optimizations posted here which i don't
think made it into the distribution. i posted one which assumed IEEE
float format to optimise the inner sort function using integers instead of
floats (because comparing fabs(x), fabs(y) can be done by loading the
values as 32-bit ints, doubling them to remove the sign bit, and
comparing... i forget the function name). there was another patch which
added some padding to some arrays which were all hot in the L1 cache --
but which had cache line aliasing problems on machines with low
associativity or small L1s.
anyhow both of those were portable enough, but i can see why such stuff
isn't desirable in a reference codec.
i admit i have the "let's see how fast we can make it go" disease :) i'd
certainly like to see a common set of patches with the known optimizations
in them, i bet some distributions would include those patches when
building their binaries.
-dean
More information about the Vorbis-dev
mailing list