[Theora-dev] Questions, MMX and co.
rodolphe.ortalo at free.fr
Wed Aug 25 11:40:34 PDT 2004
Thanks to all those who do work on MMX (I do not count me among them for the
little compile&run I've done). However, looking at the various implementation
paths, it seems to me it would be nice to answer the following
design/development questions before competition among different variants of
the same code starts to pollute the debate. (Speed issues *always* generate
flamewars in the end... :-)
1) Should the C, MMX, MMXEXT, SSE (and possibly later on SSE3 or SSE4)
variants of functions be:
1-A) selected at compile time (via #ifdef or compiler flags), like what
HoSwiYO did for the decoder, or me last year: one binary version for each;
1-B) all available simultaneously in the library and be selected at run time
(thus, probably using the (*funcPointer)(a,b) approach like Wim did in his
1-C) more complex solution (ideas? dllopen()?)
2) Which compiler should be supported?
3) What is preferred:
3-A) inline assembly,
3-B) (x)mmintrin.h-based MMX functions (Intel compiler, GCC, maybe others)?
4) How to benchmark the implementation? (I'm still using the small wav+yuv
video with this cute little girl singing but I guess something more serious
should be done...) If possible, it should be easily accessible to everyone
(no expensive digital equipment, no multi-gigabyte downloads) so that
everyone could reproduce the test and compare results.
5) How to *validate* the implementation? It is probably easy to introduce
biais in the encoder in the C to MMX transformation process. On the other
hand, sometimes output is different and this is normal (psavgb does (A+B+1)/2
which is more precise than (A+B)/2). Maybe we can simply trust experimental
work, or rely on a good 4), but then...?
6) Who has the definitive answer on the above questions: or btw, who rules
My own 0.02:
Note that 1&3 affect performance: IMHO, 1-A + 3-B is the maximal performance
gain. (But then, I'm GCC-centric, and that's the usual GCC way: source code
is available and maintainance is done at night... :-)
Too, I'd say that 2) should only include "GCC >=3.4" but that's definitely
extremely selfish... It's just that the Intel compiler is already better than
GCC (wrt perf. of generated code) so; let's handicap him a little! :-)
For four :-), I'd say that maybe we should select a few tracks of common video
DVD (commercial ones) that possibly everyone in the computer development
business already has bought (like one of the Lord of the rings, or the Matrix
trilogy, I'd bet that among these 6 everyone already has one) and publish a
few scripts for transcoding the selected tracks and isolating Theora video
encoding cpu time.
Concerning 5, well, I had tried a framework for back to back testing of C and
assembly implementation last year (ie: execute at runtime *both* the C
function and the asm one with the same parameters and compare results
accuracy. But, that's another level of #ifdef definitions. Well, it caught
some bugs for me but I wonder if that was worth the effort... No opinion. I'd
rather trust 4.
Concerning 6, would you trust voting machines that use punch cards, or have a
MMX-optimized crypto engine? :-)))
On Wednesday 25 August 2004 00:37, VP3HoSwiYO wrote:
> good morning everybody.
> I have finished converting mmx decode code to gcc.
> Now this can be compiled with both vc and gcc.
More information about the Theora-dev