[Theora-dev] Questions, MMX and co.

Wed Aug 25 16:43:06 PDT 2004

On Wed, Aug 25, 2004 at 08:40:34PM +0200, Rodolphe Ortalo wrote:

> 1) Should the C, MMX, MMXEXT, SSE (and possibly later on SSE3 or SSE4) 
> variants of functions be:
> 1-A) selected at compile time (via #ifdef or compiler flags), like what 
> HoSwiYO did for the decoder, or me last year: one binary version for each;
> 1-B) all available simultaneously in the library and be selected at run time 
> (thus, probably using the (*funcPointer)(a,b) approach like Wim did in his 
> encoder patch);
> 1-C) more complex solution (ideas? dllopen()?)

dlopen() is overkill here. There need to be both compile-time switches 
(for source portability) and run-time switches (for binary portability).

> 2) Which compiler should be supported?

All of them? :) GCC and MSVC at a minimum.

> 3) What is preferred:
> 3-A) inline assembly,
> 3-B) (x)mmintrin.h-based MMX functions (Intel compiler, GCC, maybe others)?

You forgot raw .asm files. Concensus on irc was both. the intrinsics are 
easier to understand and can help take advantage of compiler 
improvements for scheduling; raw assembly is about the only thing you 
can make portable.

> 4) How to benchmark the implementation? (I'm still using the small wav+yuv 
> video with this cute little girl singing but I guess something more serious 
> should be done...) If possible, it should be easily accessible to everyone 
> (no expensive digital equipment, no multi-gigabyte downloads) so that 
> everyone could reproduce the test and compare results.

Source is always going to mean multi-gigabyte downloads. We have a 
collection, but some of it needs to be put back online. Obviously with 
decoder benchmarks it's a little easier.

> 5) How to *validate* the implementation? It is probably easy to introduce 
> biais in the encoder in the C to MMX transformation process. On the other 
> hand, sometimes output is different and this is normal (psavgb does (A+B+1)/2 
> which is more precise than (A+B)/2). Maybe we can simply trust experimental 
> work, or rely on a good 4), but then...?

Well, encoder output should be bit-for-bit identical, at least for the 
reference implementation. On the decoder one can make speed/quality 
tradeoffs.

> 6) Who has the definitive answer on the above questions: or btw, who rules 
> Xiph/Theora? :-)

Me.

 -r