[Theora-dev] Questions, MMX and co.

Rodolphe Ortalo rodolphe.ortalo at free.fr
Thu Aug 26 11:16:59 PDT 2004


On Thursday 26 August 2004 19:25, Christoph Lampert wrote:
[...]
> MPlayer has a compile time switch to enable or disable run-time CPU
> detection.
> But anyway, many SIMD optmized projects check for CPU flags on the fly,
> and it works very nicely using function pointers, set in an init-phase or
> at first call. We also benchmarked the overhead, and even for tiny
> operations like 8x8 SAD it was negligible. I guess jump-prediction is
> working well these days.

Good to know. I was really wondering on this one and I believe such benchmarks 
may not be easy to achieve. (BTW, IMHO it shows too that compilers can be 
pretty good sometimes...)

> The my main point was to vote in favour of compiler _intrinsics_!
> At XviD, we chose NASM as assembler for external ASM files, because it
> has a strong MACRO language and is available for many plattforms,
> including Windows and Linux, of course. Also, intrinsics weren't developed
> very far when we started.
> But there are at least two drawbacks, and if we had to decide again today,
> we might decide differently:
[...points skipped...]

That really fits my personal intuition. Even though compiler intrinsics sounds 
strange sometimes, the C code with them *looks* better than C code with 
inline assembly or even raw assembly.
I know that may sound like a strange argument, but sometimes, I tend to think 
that code that *looks* good is code that will run fast too :-). Maybe because 
some french engineer ones said that only beautiful planes could fly; or maybe 
because, if one finds code easy to read, the compiler will too.
Anyway, even though compiler intrinsics certainly have drawbacks, and though 
good assembly programmer can do better; I would certainly try intrinsics 
first (in fact, I've already nearly converted all of lib/i386/dsp_mmxext.c 
starting on Wim patch).

[...bit exact comments skipped...]

Ralph Giles apparently thinks that bit-exact output should be produced by the 
(C or MMX) encoders. I wouldn't be so drastic, but that's true that I'm 
afraid small errors could slip in the hardware accelerated implementations. 
(Plus his answer to point 6 :-). That's also why it would be nice not to 
spread work in too much directions and do careful code reviews... (I know 
that's cumbersome, but, well...)

Rodolphe


More information about the Theora-dev mailing list