[theora-dev] Benchmarks Inline-ASM vs. Intrinsics
Nils Pipenbrinck
n.pipenbrinck at cubic.org
Wed Feb 11 08:13:45 PST 2009
Timothy B. Terriberry wrote:
> Keep in mind that oc_frag_recon_* together account for less than 6% of
> decoding time, so a 2% overall slowdown means a 33% slowdown in those
> functions (and similarly, about a 700% slowdown for the C version of
> oc_frag_recon_inter_mmx). The cost of the iDCTs are somewhat larger (8%
> of the total, or so), so a similar slowdown there will bring an even
> larger drop in total performance (and there should not be any cache
> misses to mask gcc's inefficiencies in the iDCTs, unlike the recon
> functions).
>
True - the recon functions spend a lot of time waiting for the cache (at
least that's
what they did when I wrote the asm code back in 2007). Porting the iDCT
is not
such a easy task as porting the recon functions, but it could still be
interesting.
> What version of gcc did you use?
>
GCC 4.3.1 on cygwin.
I'll try some benchmarking on ubuntu the day or another.
More information about the theora-dev
mailing list