[theora-dev] Benchmarks Inline-ASM vs. Intrinsics

Nils Pipenbrinck n.pipenbrinck at cubic.org
Wed Feb 11 08:13:45 PST 2009

Timothy B. Terriberry wrote:
> Keep in mind that oc_frag_recon_* together account for less than 6% of
> decoding time, so a 2% overall slowdown means a 33% slowdown in those
> functions (and similarly, about a 700% slowdown for the C version of
> oc_frag_recon_inter_mmx). The cost of the iDCTs are somewhat larger (8%
> of the total, or so), so a similar slowdown there will bring an even
> larger drop in total performance (and there should not be any cache
> misses to mask gcc's inefficiencies in the iDCTs, unlike the recon
> functions).
True - the recon functions spend a lot of time waiting for the cache (at 
least that's
what they did when I wrote the asm code back in 2007). Porting the iDCT 
is not
such a easy task as porting the recon functions, but it could still be 

> What version of gcc did you use?
GCC 4.3.1 on cygwin.

I'll try some benchmarking on ubuntu the day or another.

More information about the theora-dev mailing list