[theora-dev] SSE2 assembly support
Timothy B. Terriberry
tterribe at email.unc.edu
Wed Feb 10 15:30:31 PST 2010
There is some room for SSE2 optimizations (I just committed some earlier
today), but right now the slowest functions in the encoder are all in C.
A few of these could benefit from SIMD, but algorithmic optimizations
will be both easier and give bigger performance improvements. Many of
the existing SIMD functions operate on 8x8 blocks, and so MMX is
generally enough to extract the maximum amount of parallelism.
Restructuring things to operate on larger blocks when possible is a good
idea, but a lot more work.
Finally, I am not generally a fan of intrinsics because a) their
portability is overrated and b) last I checked, compilers generate
horrible code from them. The current inline asm already works for 32-bit
and 64-bit platforms, except on Windows, but that is MSVC's fault.
More information about the theora-dev