[Theora-dev] Questions, MMX and co.

David Kuehling dvdkhlng at gmx.de
Wed Aug 25 13:50:07 PDT 2004

>>>>> "Rodolphe" == Rodolphe Ortalo <rodolphe.ortalo at free.fr> writes:

> 1) Should the C, MMX, MMXEXT, SSE (and possibly later on SSE3 or SSE4)
> variants of functions be: 1-A) selected at compile time (via #ifdef or
> compiler flags), like what HoSwiYO did for the decoder, or me last
> year: one binary version for each; 1-B) all available simultaneously
> in the library and be selected at run time (thus, probably using the
> (*funcPointer)(a,b) approach like Wim did in his encoder patch); 1-C)
> more complex solution (ideas? dllopen()?)

My two cents: if possible always select the variant at run-time.  With
all the CPU variants currently on the market (MMX, MMXEXT, SSE, SSE2,
3DNOW and whatever...) it would be a terrible headache for binary Linux
distributions to provide properly optimized packages.  Mplayer as an
example is AFAIK completely run-time CPU detection based.  On the other
hand, for ATLAS on Debian I can select among 4 different packages, and
the installation process must make sure that the linker is properly
configured to make all applications use one of those 4 libraries.

> 2) Which compiler should be supported?

> 3) What is preferred: 3-A) inline assembly, 3-B) (x)mmintrin.h-based
> MMX functions (Intel compiler, GCC, maybe others)?

Inline assembly is compiler-specific and as far as GCC is concerned had
always problems with hard-to-trace compiler-bugs occuring on specific
GCC versions.  (faulty register optimization, internal compiler errors

The way I like most is to put all low-level CPU stuff into separate ASM
files.  If you use some assembly format (MASM vs AT&T?) that is widely
supported, you will greately reduce porting problems (and compile-time

This creates some more problems with the low-level C calling interface
and with accessing C data types from assembly.  But usually such issues
can be handled via configure-scripts.  If you need access to C-structs
from ASM, just use some short C-program to dump the offsets of
interesting members into some ASM header-file.  The Allegro
cross-platform gaming library was quite successful with such an

> 5) How to *validate* the implementation? It is probably easy to
> introduce biais in the encoder in the C to MMX transformation
> process. On the other hand, sometimes output is different and this is
> normal (psavgb does (A+B+1)/2 which is more precise than
> (A+B)/2). Maybe we can simply trust experimental work, or rely on a
> good 4), but then...?

I thought that at least the (I)DCT should be bit-perfectly equal to the
reference encoder.  Else you will have terrible artifacts if people
encode movies with large keyframe distances (I already encoded
Theora-movies with keyframes spaced 512 frames apart).

GnuPG public key: http://user.cs.tu-berlin.de/~dvdkhlng/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40

More information about the Theora-dev mailing list