[Speex-dev] Resampler experimental speedups
thorvald at natvig.com
Thu Apr 3 17:15:23 PDT 2008
The attached patch (which is not in any way finished) optimizes the
resampler. (For those following the discussions on IRC; this version
includes optimizations for both direct and interpolate cases).
Using GCC 4.3, x86_64, Valgrind to measure instruction counts,
resampling 10 frames of 320 floats at quality 3. Direct was measured
with a 16=>48 resampling, and interpolate with a 16=>44.1 resampling.
Using just '-O2':
Original: Direct 4548 k, Interpolate 9657k
This version: Direct 2992k, Interpolate 9003k
So this version uses only 65% of the instructions of the one in SVN for
the direct, which I think is decent speedup :) For interpolate, there's
so much to do in each loop iteration that my tricks only give a marginal
improvement (5% or so). Note that no loop unrolling has been done; for
the direct case unrolling 4 times will reduce instruction count noticeably.
Using '-ftree-vectorize -ffast-math -O3' and a profile run:
Original: Direct 3419k, Interpolate 9255k
This version: Direct 1629k, Interpolate 8588k
My loop transformations allow GCC to recognize it as vectorizable for
the direct case, giving a very nice speedup. For interpolate, we're
again hurt by the loop doing too much work. Note though that GCC
currently does not vectorize the inner loop for interpolate as it's
unable to recognize that the operations are applied equally to all
elements in accum.
On the downside, this will allocate, on the stack, in_len + st->filt_len
elements to hold a temporary array for the input. In my testcase, this
means 1472 bytes. If you use larger frames, this will scale accordingly.
Unless anyone can spot any glaring mistakes I've made, the plan is to
fix the double versions, correct the int->float (and vice versa)
conversions and make sure the magic bytes work. Then it's time for some
unrolling and _USE_SSE improvements ;)
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 9068 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20080404/eedc0f5a/attachment.patch
More information about the Speex-dev