[speex-dev] [PATCH] Make SSE Run Time option.
Jean-Marc Valin
Jean-Marc.Valin at USherbrooke.ca
Thu Jan 15 22:35:34 PST 2004
Le jeu 15/01/2004 à 15:30, Daniel Vogel a écrit :
> Unrelated, but please use SSE/MMX/... intrinsics on Windows instead of using
> inline assembly so you also get the speed benefit on Win64.
OK, so here's a first start. I've translated to intrinsics the asm I
sent 1-2 days ago. The result is about 5% slower than the pure asm
approach, so it's not too bad (SSE asm is 2x faster than x87). Note that
unlike the previous version which had a kludge to work with order 8
(required for wideband), this version only works with order 10, so it
will only work for narrowband.
<p>void filter_mem2(float *x, float *_num, float *_den, float *y, int N,
int ord, float *_mem)
{
__m128 num[3], den[3], mem[3];
int i;
/* Copy numerator, denominator and memory to aligned xmm */
for (i=0;i<2;i++)
{
mem[i] = _mm_loadu_ps(_mem+4*i);
num[i] = _mm_loadu_ps(_num+4*i+1);
den[i] = _mm_loadu_ps(_den+4*i+1);
}
mem[2] = _mm_setr_ps(_mem[8], _mem[9], 0, 0);
num[2] = _mm_setr_ps(_num[9], _num[10], 0, 0);
den[2] = _mm_setr_ps(_den[9], _den[10], 0, 0);
for (i=0;i<N;i++)
{
__m128 xx;
__m128 yy;
/* Compute next filter result */
xx = _mm_load_ps1(x+i);
yy = _mm_add_ss(xx, mem[0]);
_mm_store_ss(y+i, yy);
yy = _mm_shuffle_ps(yy, yy, 0);
/* Update memory */
mem[0] = _mm_move_ss(mem[0], mem[1]);
mem[0] = _mm_shuffle_ps(mem[0], mem[0], 0x39);
mem[0] = _mm_add_ps(mem[0], _mm_mul_ps(xx, num[0]));
mem[0] = _mm_sub_ps(mem[0], _mm_mul_ps(yy, den[0]));
mem[1] = _mm_move_ss(mem[1], mem[2]);
mem[1] = _mm_shuffle_ps(mem[1], mem[1], 0x39);
mem[1] = _mm_add_ps(mem[1], _mm_mul_ps(xx, num[1]));
mem[1] = _mm_sub_ps(mem[1], _mm_mul_ps(yy, den[1]));
mem[2] = _mm_shuffle_ps(mem[2], mem[2], 0xfd);
mem[2] = _mm_add_ps(mem[2], _mm_mul_ps(xx, num[2]));
mem[2] = _mm_sub_ps(mem[2], _mm_mul_ps(yy, den[2]));
}
/* Put memory back in its place */
_mm_storeu_ps(_mem, mem[0]);
_mm_storeu_ps(_mem+4, mem[1]);
_mm_store_ss(_mem+8, mem[2]);
mem[2] = _mm_shuffle_ps(mem[2], mem[2], 0x55);
_mm_store_ss(_mem+9, mem[2]);
}
<p> Jean-Marc
--
Jean-Marc Valin, M.Sc.A., ing. jr.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: Ceci est une partie de message numériquement signée.
Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20040116/444ce574/signature.pgp
More information about the Speex-dev
mailing list