[theora-dev] Patch: fragment reconstruction MMX for GCC

Nils Pipenbrinck n.pipenbrinck at cubic.org
Mon Dec 31 00:37:32 PST 2007

Hi gentlemen,
> There were two primary problems with the code as it stood. The first was
> specific to x86-64: you have to cast the strides to long's so that they
> are placed in 64-bit registers instead of 32-bit registers, or you can't
> use them in indexing instructions with 64-bit pointers.

Ah - I see. *Light bulb goes on* That was the reason for the long 
casts.. Good to know.
> The second was specific to x86-32: when -fPIC is used and
> -fomit-frame-pointer is not, 

I see.... I never used -fPIC on x86. Guess that's why I always assume 6 
free registers. On win32 you can even mess around with ESP if you want 
to. Interrupts have their own stackframe anyways.

I'm a bit sceptical about the inter2 loop though. Timothy, could you 
please email me a compiled object file of mmxfrag.c privately, so I can 
run my benchmarks and have a look at the generated code.  I can only run 
GCC 3.2.2 on my machine and it does horrible things when mixing C loops 
with asm.

I'd rather unroll via macros or let gas do the job. If a modern GCC get 
things right I'm fine with it though. Inter2 is *the* performance hog at 
the moment, so each percent saved makes a difference to the total decode 

Btw - good to know that the different strides aren't required. For the 
source-strides this is obvious but I thought maybe theora supports 
dynamic changes of the video size or something like that. I think the 
unused parameters should be removed from the function prototypes now. No 
need to pass arguments that aren't used, and it makes the code more 
readable as well.

I'll take a look at the dequant part in oc_state_frag_recon_mmx in the 
next days.. Need something new to chew on.

More information about the theora-dev mailing list