[theora-dev] Patch: fragment reconstruction MMX for GCC
Nils Pipenbrinck
n.pipenbrinck at cubic.org
Mon Dec 31 00:37:32 PST 2007
Hi gentlemen,
> There were two primary problems with the code as it stood. The first was
> specific to x86-64: you have to cast the strides to long's so that they
> are placed in 64-bit registers instead of 32-bit registers, or you can't
> use them in indexing instructions with 64-bit pointers.
>
Ah - I see. *Light bulb goes on* That was the reason for the long
casts.. Good to know.
> The second was specific to x86-32: when -fPIC is used and
> -fomit-frame-pointer is not,
I see.... I never used -fPIC on x86. Guess that's why I always assume 6
free registers. On win32 you can even mess around with ESP if you want
to. Interrupts have their own stackframe anyways.
I'm a bit sceptical about the inter2 loop though. Timothy, could you
please email me a compiled object file of mmxfrag.c privately, so I can
run my benchmarks and have a look at the generated code. I can only run
GCC 3.2.2 on my machine and it does horrible things when mixing C loops
with asm.
I'd rather unroll via macros or let gas do the job. If a modern GCC get
things right I'm fine with it though. Inter2 is *the* performance hog at
the moment, so each percent saved makes a difference to the total decode
time.
Btw - good to know that the different strides aren't required. For the
source-strides this is obvious but I thought maybe theora supports
dynamic changes of the video size or something like that. I think the
unused parameters should be removed from the function prototypes now. No
need to pass arguments that aren't used, and it makes the code more
readable as well.
I'll take a look at the dequant part in oc_state_frag_recon_mmx in the
next days.. Need something new to chew on.
More information about the theora-dev
mailing list