[vorbis-dev] Optimizations, simple but effective

Monty xiphmont at xiph.org
Fri Apr 28 20:16:32 PDT 2000



> I found one alarming thing -- 37% of the CPU time was being
> spent in hypot(), in _vlpc_de_helper in lpc.c.  Apparently the
> Visual C++ version of hypot is hideously slow... it was spending
> all its time setting up rounding modes and calling all kinds of
> funky math functions, ensuring some form of numeric
> stability that we don't need.  So I changed the hypot to
> a simple sqrt(a*a + b*b) and the code became drastically 
> faster (as the compiler then just uses the chip's fsqrt 
> instruction; hypot was calling some software sqrt thing).
> I don't know whether gcc has similar issues but someone should
> check this.

Amusing.  In GCC (more likely just glibc), the opposite is true; sqrt() is much
slower than hypot().

> It also seemed like a lot of time was being spent in vorbisfile
> converting float to int (as seen in the earlier discussion) and 
> writing the samples to the output buffer.

Yes, that bit of code is not well optimized either :-)  I think general 
agreement there is just to (int)(float_var+.5)

> I made two basic changes here.  One was using the fast integer 
> conversion code that I posted last week (but inserted into
> vorbisfile, not in the dct code as was being pondered earlier).
> The other change was, I checked whether the endianness of the 
> machine was the same as the output endianness that the user 
> asks for.  If so, I use a loop that outputs each sample as a 
> single 'short', which eliminates the bit shifting and masking
> and half the write operations.  Because the user is almost 
> always going to want the numbers back in their native
> endianness, this is an effective optimization.

I actually have similar code from older Ogg implementations that does this too.
I'll compare it to what you've done.

> With these changes in place, my app runs almost twice as fast
> as before.  And looking at the way things are set up, I'm 
> pretty confident I could get at least another factor of 2 or 3
> speed improvement out of the existing code when it comes time
> to Optimize For Real.

Heh.  No one *ever* believes me when I say I haven't done *any* optimization
(except on an algorithmic level) yet :-)

> there doesn't seem to be an official vc6 project checked into the
> tree.  I offer to check this in and maintain it if that is
> acceptable (I'm going to be maintaining a vc6 project, so I might
> as well share the effort.

You are correct.  If you're gotten it to work (seems you've done that well) and
are willing to maintain it, you're welcome to commit access. All I need from
you is an ssh public key and all you'll need is ssh/CVS.

> Should I consider submitting a patch with these optimizations in
> it?  The sqrt(a*a + b*b) seems like a no-brainer to include;
> the other ones are more questionable, it just depends on how
> much we care about performance at this stage.  What is the
> procedure for submitting a patch, do I just email it to Monty?

If it's a portable patch of clean, tested code, it goes on the mainline.  I'll
do it myself, but I prefer to give out CVS commit access once someone has
demonstrated sufficient clue to make appropriate modifications to the mainline
(congrats).  In that case, commit as you feel fit, but do be sure to alert the
list when you do so.  I'll set up CVS to mail log entries of commits out, but I
haven't done so yet.

> Also, someone was talking earlier about assembly-language 
> optimization of the code at some point in future.  One thing we 
> have learned in game development is that there is not much point 
> to writing assembly code for modern processors; you gain hardly
> anything beyond what you can achieve from C.  Even in the cases
> of special instruction sets (like Katmai or the 3DNow instructions)
> there are C macros that you can use to do the right thing, most of
> the time.  Speaking of which, 3DNow has an excellent fast inverse
> square root function that would be just the sauce for replacing
> that 1./hypot in _vlpc_de_helper.

Oooo. :-)  I have no objection to clean use of assembly as appropriate,  I just
don't want to see the *mainline* being re-written in large swaths of assembly
(ala GOGO).  That sort of work belongs, at very least, on a parallel branch.  I
agree with your belief that going below C is appropraite only in limited ways.

Monty

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/



More information about the Vorbis-dev mailing list