[vorbis-dev] optimizing float to int conversions
Monty
xiphmont at xiph.org
Wed May 5 15:37:47 PDT 2004
On Thu, May 06, 2004 at 12:19:31AM +0200, Tal wrote:
> In general I agree with you...
> I immediately saw that this is a problem with the compiler.
> I'm ashamed of some of my questions.
No worries, I didn't mean to come across quite as quite so scary...
Really the only point I was actually trying to make is that you need
not make educated guesses about what the compiler is doing, and it's
probably faster and more reliable to inspect the assembly than to test
theories the hard way via process of elimination in the C.
The last time I looked at this problem (and you are right; casts are
generally done very poorly/conservatively by x86 C compilers) I found
quickly from looking at the assembly that many of my theories about
what was wrong were incorrect.
> Still, the code I proposed isn't machine depended (written in c).
> It is like the compiler itself put that code there...
> I agree that it is a compiler work but for today it seems an over kill
> to use a serious compiler that costs a lot of money to such small
> optimization.
> Is there a compiler that does a good job with it?
> Moreover, this kind of code may ruin some other optimizations that I
> plan, so I'll use any compiler that do a good job.
What we have resorted to is assembly macros in an #ifdef jail that
special case conversions on some platform/compiler combinations.
> I have another optimization depending on this macro.
> Consider rint:
> The function is using floor which has poor performance too.
> An alternative is:
> x:float
> #Define rint(x) (int)(x+0.5)
> Where the conversion is done with macro.
>
> All together it gave 6% speedup.
> Can we leave it just because the compiler is conservative?
Also depends on compiler; floor is a fast macro on x86/gcc, faster
than add+cast (but then, so is rint IIRC). I think we used floor()
only because MSVC and some other compilers don't have rint().
One other thing is that in many places the current encoder structure
is simply inefficient; avoiding the need for a cast (or similar) is
generally superior to finding a way to make it fast. We're talking
right now about 5%-15% speedups when a real code reorg could trim 80%
or more off execution time.
But for fast, temporary gains (where a quick macro makes a big
difference until a larger, more efficient change can be made) there's
no problem with spot fixes via macro. The only real thing I object to
is #ifdef anywhere outside of the ifdef jails :-) I also generally
want to know why changes make things faster (analysis of assembly)
unless the improvement is self-evident (eg, replacing bubble sort with
quicksort).
Monty
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis-dev
mailing list