[Theora-dev] Questions, MMX and co.

Thu Aug 26 11:16:59 PDT 2004

On Thursday 26 August 2004 19:25, Christoph Lampert wrote:
[...]
> MPlayer has a compile time switch to enable or disable run-time CPU
> detection.
> But anyway, many SIMD optmized projects check for CPU flags on the fly,
> and it works very nicely using function pointers, set in an init-phase or
> at first call. We also benchmarked the overhead, and even for tiny
> operations like 8x8 SAD it was negligible. I guess jump-prediction is
> working well these days.

Good to know. I was really wondering on this one and I believe such benchmarks 
may not be easy to achieve. (BTW, IMHO it shows too that compilers can be 
pretty good sometimes...)

> The my main point was to vote in favour of compiler _intrinsics_!
> At XviD, we chose NASM as assembler for external ASM files, because it
> has a strong MACRO language and is available for many plattforms,
> including Windows and Linux, of course. Also, intrinsics weren't developed
> very far when we started.
> But there are at least two drawbacks, and if we had to decide again today,
> we might decide differently:
[...points skipped...]

That really fits my personal intuition. Even though compiler intrinsics sounds 
strange sometimes, the C code with them *looks* better than C code with 
inline assembly or even raw assembly.
I know that may sound like a strange argument, but sometimes, I tend to think 
that code that *looks* good is code that will run fast too :-). Maybe because 
some french engineer ones said that only beautiful planes could fly; or maybe 
because, if one finds code easy to read, the compiler will too.
Anyway, even though compiler intrinsics certainly have drawbacks, and though 
good assembly programmer can do better; I would certainly try intrinsics 
first (in fact, I've already nearly converted all of lib/i386/dsp_mmxext.c 
starting on Wim patch).

[...bit exact comments skipped...]

Ralph Giles apparently thinks that bit-exact output should be produced by the 
(C or MMX) encoders. I wouldn't be so drastic, but that's true that I'm 
afraid small errors could slip in the hardware accelerated implementations. 
(Plus his answer to point 6 :-). That's also why it would be nice not to 
spread work in too much directions and do careful code reviews... (I know 
that's cumbersome, but, well...)

Rodolphe