[theora-dev] Benchmarks Inline-ASM vs. Intrinsics
n.pipenbrinck at cubic.org
Wed Feb 11 08:13:45 PST 2009
Timothy B. Terriberry wrote:
> Keep in mind that oc_frag_recon_* together account for less than 6% of
> decoding time, so a 2% overall slowdown means a 33% slowdown in those
> functions (and similarly, about a 700% slowdown for the C version of
> oc_frag_recon_inter_mmx). The cost of the iDCTs are somewhat larger (8%
> of the total, or so), so a similar slowdown there will bring an even
> larger drop in total performance (and there should not be any cache
> misses to mask gcc's inefficiencies in the iDCTs, unlike the recon
True - the recon functions spend a lot of time waiting for the cache (at
what they did when I wrote the asm code back in 2007). Porting the iDCT
such a easy task as porting the recon functions, but it could still be
> What version of gcc did you use?
GCC 4.3.1 on cygwin.
I'll try some benchmarking on ubuntu the day or another.
More information about the theora-dev