[Vorbis-dev] Low level optimization

Tuomo Latto djv at mbnet.fi
Fri Feb 11 13:53:09 PST 2005


dean gaudet wrote:
> Tuomo Latto wrote:
>>getting the performance benefits requires extra effort from developers
>>(=coding stuff in asm)...
> 
> i disagree.
> 
> gcc, microsoft, intel and pathscale compilers will generate SSE/SSE2 fp
> code if you ask them to -- directly from C float/doubles.  for example
> compiling with gcc -mfpmath=sse -msse2 is sufficient to get unvectorized
> sse/sse2 -- which is frequently faster than x87 code.  the intel compiler
> will vectorize when it can (and i think the same is true for pathscale).

I stand corrected.
However, this also means that there are actually fewer reasons for having
asm optimizations as compilers can do this on their own.
These optimizations are therefore already available.


> furthermore, if you study the IA32 ISA
> <http://www.intel.com/design/pentium4/manuals/index_new.htm> you'll
> see that for every mmx/sse/2/3 instruction intel has defined various
> "instrinsics".  these intrinsics are C function calls which access the
> specified instruction.  you'll be happy to know that this same set of
> intrinsics is supported across most x86 compilers.  that is to say:  you
> don't have to write assembly, you need only write to the x86 intrinsics
> and it should port across gcc, microsoft, intel and pathscale compilers.
> 
> for example:
> 
> 	__m128 _mm_add_ps(__m128 a, __m128 b);
> 
> becomes an addps (packed singles).

Once again, I bow before your greater wisdom.

I think the keyword here is x86 (or actually "pentium4" in that url path).
And furthermore, even if the compiler supports it, the support won't
magically appear onto one's CPU...


>>>Seriously though, using asm would probably reduce portability.
>>>GCC (=cygwin, mingw, ..?) uses AT&T syntax.
> 
> the x86 assembler of choice for portable assembly is nasm.  it is
> open source, well maintained, cross-platform, and generates object
> files compatible with microsoft and gcc compilers.  you'll find it
> used in various packages requiring portable windows/unix assembly
> (i.e. mjpegtools).  there are other methods available as well -- such
> as the perl wrapped assembly in openssl.

Nasm is quite nice, yes.
I also happens to be yet another build requirement.
What I like about (Xiph's) current Ogg/Vorbis stuff is that it
specifically _does_not_ require a lot of other stuff. You can build
it OOTB and not worry about getting a bunch of requirements and
dependencies first.


>>You probably right. Thank you and thanks to all for attention.
> 
> no, please don't let them deter you from optimizing the codecs.

Dean's right. These things are open source, so you are free to play
with them regardless of what other people say.
(Some might even say that you are required to play with them.)
My main point was to show you some of the possible problems associated
with your undertaking. The main problems being portability and
upkeep (=prevention of bit rot).

I don't have any authority on this but I would think that Xiph people would
want to keep the reference encoder as clutter free as possible.
(#ifdefs, asm, ...)
Unless, of course, someone is willing to take the responsibility for
keeping the optimizations up-to-date. Even then _I_ don't know since
one of the points of reference encoder and decoder is having them as
readable references of operation as possible.
All optimizations and extra requirements should probably at least be
non-default build options.
But please do ask Xiph people, as I really wouldn't know.

If the Xiph encoder is a no-go then you can always get involved
with some other encoder or even fork your own source tree.
Or maybe there could be an official performance branch or patch set
(or just plain source/build/configure/... files) on the repository
for the encoder and decoder that would contain all the optimized
versions for different platforms.
That should also be kept as portable as possible and therefore would
contain the enormous amount of #ifdef/asm/external requirement clutter
needed. That way the actual reference code could stay readable and thus
kept as "the final authority" but there would also be "official", albeit 
somewhat less reliable (and less up-to-date?) optimized versions.


-- 
Tuomo

... People who think they know it all are especially annoying



More information about the Vorbis-dev mailing list