[theora-dev] Blog post about Theora and MSVC assembly

Sat Dec 22 00:31:19 PST 2007

Ralph Giles wrote:
> Sebastian reminded me about Timothy's (other) objection which IIRC was 
> that with nasm you have to do all the function call stuff in asm instead 
> of C, which is tedious.

It's a bit more than just the function calls and stack management
(which, with inline asm, can actually be avoided completely in many
cases). There's also all the different conventions which gcc handles for
you. Do you have to preserve %ebp? %ebx? I.e., are you compiling without
-fomit-frame-pointer or with -fPIC? How do you mangle names? This
changes between Linux, MacOS X, mingw32, etc., even for the same
compiler and hardware architecture. How do you address global look-up
tables? The list goes on.

If you're clever, you can even write one set of asm for both 32-bit and
64-bit x86, and let gcc handle all the differences in pointer size,
etc., for you. The decoder asm has exactly one function that's different
for 64-bit, and that's only because there are a number of register
spills on the 32-bit version that are easily avoidable with the extended
register set of the 64-bit architecture.

So, in other words, your "same codebase" is once again at least two
codebases, one for 32- and 64-bit, so you haven't actually improved
matters any. To make matters worse, you now have to handle all the
niggling little details yourself with garbage like the MANGLE macro in
the encoder asm that I know is broken for one or more existing platforms
as it stands. Whereas in contrast having one gcc codebase and one MSVC
codebase means that the C compiler takes care of all of that for you.

You're still stuck with the main downside, which is that none of the
core developers can maintain the MSVC code, but nasm doesn't really
solve that either. Unless you actually have all the combinations of
platforms, etc., listed above, you can't actually test that you got all
the niggling little details right. At least with the MSVC-specific route
you have the best chance of getting a random win32 developer to patch
things up for you when needed.

My understanding with intrinsincs was that although you may be able to
work around the portability issues, the code gcc generates is very bad.
See http://lists.xiph.org/pipermail/theora-dev/2005-August/002851.html
Specifically:

> ...GCC often generates really bad code when they're used.  So I tend
> to end up writing inline assembly anyway.

Now, that was two years ago and perhaps things have gotten better in
that regard. But I've already got good gcc asm, so I'm not seeing a lot
of reason to swap it out for mediocre gcc asm just to gain mediocre
support for another platform, when with the same amount of work I could
have good asm on both. I might be convinced that intrinsics solve the
maintainability problem for MSVC, since in theory they could also be
tested with gcc, even if we wouldn't actually use them for real gcc builds.

The remaining options are some kind of meta-language, or an automatic
conversion script. The latter seems to have been too hard for everyone
who's looked at it so far. I certainly did not see an easy way to make
MSVC even approximate the functionality gcc provides for merging C
expressions into asm, but I did not spend a lot of time on the problem.
You've already listed the drawbacks of using a meta-language, though I
don't see why you'd bother to do the assembling at run time. Theora does
not use enough beyond basic MMX to make it worth-while to have a
separate version for every level of processor support, and even if it
did, we already have runtime detection without having to worry about
dynamic compilation. Although free PPC support is intriguing, I don't
see why that couldn't just be done at compile time as well.

So that's my view on the subject. As the one who would actually be doing
the work, Nils, what's your take?