[theora-dev] Benchmarks Inline-ASM vs. Intrinsics

Timothy B. Terriberry tterribe at email.unc.edu
Wed Feb 11 06:39:18 PST 2009

Nils Pipenbrinck wrote:
> I've just applied the changes to the fragment reconstruction functions 
> as writing the IDCT and loopfilter have not been ported yet. 
> Nevertheless here are some numbers:

Keep in mind that oc_frag_recon_* together account for less than 6% of
decoding time, so a 2% overall slowdown means a 33% slowdown in those
functions (and similarly, about a 700% slowdown for the C version of
oc_frag_recon_inter_mmx). The cost of the iDCTs are somewhat larger (8%
of the total, or so), so a similar slowdown there will bring an even
larger drop in total performance (and there should not be any cache
misses to mask gcc's inefficiencies in the iDCTs, unlike the recon

Still, even having said that, I was expecting on the order of a 100%
slowdown, so this is at least somewhat encouraging. What version of gcc
did you use?

More information about the theora-dev mailing list