[Theora-dev] Questions, MMX and co.

Jim Darby jim at jimbocorp.uklinux.net
Thu Aug 26 15:27:08 PDT 2004

On Wed, 2004-08-25 at 19:40, Rodolphe Ortalo wrote:
> Thanks to all those who do work on MMX (I do not count me among them for the 
> little compile&run I've done). However, looking at the various implementation 
> paths, it seems to me it would be nice to answer the following 
> design/development questions before competition among different variants of 
> the same code starts to pollute the debate. (Speed issues *always* generate 
> flamewars in the end... :-)

I'm always game for a punch up! :-)

> 1) Should the C, MMX, MMXEXT, SSE (and possibly later on SSE3 or SSE4) 
> variants of functions be:
> 1-A) selected at compile time (via #ifdef or compiler flags), like what 
> HoSwiYO did for the decoder, or me last year: one binary version for each;

Sometimes this can be done. For example, the MMX optimisations will
NEVER be of anyuse on a SPARCstation or Mac: they don't even have an
x86. Therefore in this case they can't even be compiled/assembled.

> 1-B) all available simultaneously in the library and be selected at run time 
> (thus, probably using the (*funcPointer)(a,b) approach like Wim did in his 
> encoder patch);

This is my personal preference. Obviously the ability to even have an
option may be architecture specific. For example MMX only applies to x86
and altivec only applies to PPC (see bit about #ifdef above) but once
you're on a specific architecture then specific probing should be used
to discover the Right Thing to do.

We've discussed this at some length on the transcode mailing list of

> 1-C) more complex solution (ideas? dllopen()?)

Dynamic libraries always cause problems in this context. I'd avoid them.

> 2) Which compiler should be supported?
> 3) What is preferred:
> 3-A) inline assembly,

This can often be fastest, if it will work. For example:

#if defined(__GNUC__) && defined(__x86__)
	Stuff in here
#if defined(__GNUC__) && defined(__PPC__)
	Different stuff here
	C fallback code here

> 3-B) (x)mmintrin.h-based MMX functions (Intel compiler, GCC, maybe others)?

Portable (?) between (some?) Intel and (some) GCC. Once again, it's
#ifdef time|!

> 4) How to benchmark the implementation? (I'm still using the small wav+yuv 
> video with this cute little girl singing but I guess something more serious 
> should be done...) If possible, it should be easily accessible to everyone 
> (no expensive digital equipment, no multi-gigabyte downloads) so that 
> everyone could reproduce the test and compare results.

The kernel benchmarks various things to get the right code for RAID
support. Once done it plugs it in. Nice idea.

You can also initialise a function pointer to a function that chooses
the right function, fixes the function pointer and then calls it.

> 5) How to *validate* the implementation? It is probably easy to introduce 
> biais in the encoder in the C to MMX transformation process. On the other 
> hand, sometimes output is different and this is normal (psavgb does (A+B+1)/2 
> which is more precise than (A+B)/2). Maybe we can simply trust experimental 
> work, or rely on a good 4), but then...?

It's very difficult if you're comparing (A+B)/2 and (A+B+1)/2 because
they're different and (in some cases) equally right. Maybe defining the
``right'' solution to be (for example) (A+B+1)/2 all the time might be

> 6) Who has the definitive answer on the above questions: or btw, who rules 
> Xiph/Theora? :-)

Me (of course)! :-)

Hope this helps,

Jim Darby <jim at jimbocorp.uklinux.net>
The Jimbo Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040826/0549b8e7/attachment.pgp

More information about the Theora-dev mailing list