[Theora-dev] Questions, MMX and co.
rodolphe.ortalo at free.fr
Thu Aug 26 11:16:59 PDT 2004
On Thursday 26 August 2004 19:25, Christoph Lampert wrote:
> MPlayer has a compile time switch to enable or disable run-time CPU
> But anyway, many SIMD optmized projects check for CPU flags on the fly,
> and it works very nicely using function pointers, set in an init-phase or
> at first call. We also benchmarked the overhead, and even for tiny
> operations like 8x8 SAD it was negligible. I guess jump-prediction is
> working well these days.
Good to know. I was really wondering on this one and I believe such benchmarks
may not be easy to achieve. (BTW, IMHO it shows too that compilers can be
pretty good sometimes...)
> The my main point was to vote in favour of compiler _intrinsics_!
> At XviD, we chose NASM as assembler for external ASM files, because it
> has a strong MACRO language and is available for many plattforms,
> including Windows and Linux, of course. Also, intrinsics weren't developed
> very far when we started.
> But there are at least two drawbacks, and if we had to decide again today,
> we might decide differently:
That really fits my personal intuition. Even though compiler intrinsics sounds
strange sometimes, the C code with them *looks* better than C code with
inline assembly or even raw assembly.
I know that may sound like a strange argument, but sometimes, I tend to think
that code that *looks* good is code that will run fast too :-). Maybe because
some french engineer ones said that only beautiful planes could fly; or maybe
because, if one finds code easy to read, the compiler will too.
Anyway, even though compiler intrinsics certainly have drawbacks, and though
good assembly programmer can do better; I would certainly try intrinsics
first (in fact, I've already nearly converted all of lib/i386/dsp_mmxext.c
starting on Wim patch).
[...bit exact comments skipped...]
Ralph Giles apparently thinks that bit-exact output should be produced by the
(C or MMX) encoders. I wouldn't be so drastic, but that's true that I'm
afraid small errors could slip in the hardware accelerated implementations.
(Plus his answer to point 6 :-). That's also why it would be nice not to
spread work in too much directions and do careful code reviews... (I know
that's cumbersome, but, well...)
More information about the Theora-dev