[theora-dev] Benchmarks Inline-ASM vs. Intrinsics
n.pipenbrinck at cubic.org
Wed Feb 11 05:47:35 PST 2009
Hi folks, FYI:
I've finally made some benchmarks for inline-assembler versus intrinsic
based mmx code.
I've just applied the changes to the fragment reconstruction functions
as writing the IDCT and loopfilter have not been ported yet.
Nevertheless here are some numbers:
As a baseline I'll take the current version from the trunk with all
inline assembler functions enabled. Lower values mean lower performance.
All functions with inline-asm: 100%
inter_mmx replaced by C-function: 93%
no mmx at all: 60%
all oc_frag functions intrinsic based: 98%
As you can see the current bugfix for mozilla just takes a 7%
performance hit. Imho that's something we could live with. The intrinsic
based approach is nearly as good as the handwritten code, and it
compiles with gcc as well as VS.net (haven't tried it under linux yet,
but will do so...). The gcc generated code is even a tad better than the
There is btw. a difference between VS.net whole program optimization or
simple per translation unit optimization, but the performance difference
is so small that it's nearly lost in the measurement noise. Moving the
mmx intrinsic functions into the mmxstate.c file and declaring them as
static inline made a bigger difference (still neglible).
More information about the theora-dev