[Speex-dev] Resampler experimental speedups

Thorvald Natvig thorvald at natvig.com
Thu Apr 3 17:15:23 PDT 2008


Hello :)

The attached patch (which is not in any way finished) optimizes the 
resampler. (For those following the discussions on IRC; this version 
includes optimizations for both direct and interpolate cases).

Using GCC 4.3, x86_64, Valgrind to measure instruction counts, 
resampling 10 frames of 320 floats at quality 3. Direct was measured 
with a 16=>48 resampling, and interpolate with a 16=>44.1 resampling.

Using just '-O2':
Original: Direct 4548 k, Interpolate 9657k
This version: Direct 2992k, Interpolate 9003k

So this version uses only 65% of the instructions of the one in SVN for 
the direct, which I think is decent speedup :) For interpolate, there's 
so much to do in each loop iteration that my tricks only give a marginal 
improvement (5% or so). Note that no loop unrolling has been done; for 
the direct case unrolling 4 times will reduce instruction count noticeably.

Using '-ftree-vectorize -ffast-math -O3' and a profile run:
Original: Direct 3419k, Interpolate 9255k
This version: Direct 1629k, Interpolate 8588k

My loop transformations allow GCC to recognize it as vectorizable for 
the direct case, giving a very nice speedup. For interpolate, we're 
again hurt by the loop doing too much work. Note though that GCC 
currently does not vectorize the inner loop for interpolate as it's 
unable to recognize that the operations are applied equally to all 
elements in accum[].

On the downside, this will allocate, on the stack, in_len + st->filt_len 
elements to hold a temporary array for the input. In my testcase, this 
means 1472 bytes. If you use larger frames, this will scale accordingly.

Unless anyone can spot any glaring mistakes I've made, the plan is to 
fix the double versions, correct the int->float (and vice versa) 
conversions and make sure the magic bytes work. Then it's time for some 
unrolling and _USE_SSE improvements ;)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: resample-both-test.patch
Type: text/x-diff
Size: 9068 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20080404/eedc0f5a/attachment.patch 


More information about the Speex-dev mailing list