[speex-dev] [PATCH] Make SSE Run Time option.

Jean-Marc Valin Jean-Marc.Valin at USherbrooke.ca
Thu Jan 15 22:35:34 PST 2004


Le jeu 15/01/2004 à 15:30, Daniel Vogel a écrit :
> Unrelated, but please use SSE/MMX/... intrinsics on Windows instead of using
> inline assembly so you also get the speed benefit on Win64.

OK, so here's a first start. I've translated to intrinsics the asm I
sent 1-2 days ago. The result is about 5% slower than the pure asm
approach, so it's not too bad (SSE asm is 2x faster than x87). Note that
unlike the previous version which had a kludge to work with order 8
(required for wideband), this version only works with order 10, so it
will only work for narrowband.

<p>void filter_mem2(float *x, float *_num, float *_den, float *y, int N,
int ord, float *_mem)
{
   __m128 num[3], den[3], mem[3];
   int i;

   /* Copy numerator, denominator and memory to aligned xmm */
   for (i=0;i<2;i++)
   {
      mem[i] = _mm_loadu_ps(_mem+4*i);
      num[i] = _mm_loadu_ps(_num+4*i+1);
      den[i] = _mm_loadu_ps(_den+4*i+1);
   }
   mem[2] = _mm_setr_ps(_mem[8], _mem[9], 0, 0);
   num[2] = _mm_setr_ps(_num[9], _num[10], 0, 0);
   den[2] = _mm_setr_ps(_den[9], _den[10], 0, 0);
   
   for (i=0;i<N;i++)
   {
      __m128 xx;
      __m128 yy;
      /* Compute next filter result */
      xx = _mm_load_ps1(x+i);
      yy = _mm_add_ss(xx, mem[0]);
      _mm_store_ss(y+i, yy);
      yy = _mm_shuffle_ps(yy, yy, 0);
      
      /* Update memory */
      mem[0] = _mm_move_ss(mem[0], mem[1]);
      mem[0] = _mm_shuffle_ps(mem[0], mem[0], 0x39);

      mem[0] = _mm_add_ps(mem[0], _mm_mul_ps(xx, num[0]));
      mem[0] = _mm_sub_ps(mem[0], _mm_mul_ps(yy, den[0]));

      mem[1] = _mm_move_ss(mem[1], mem[2]);
      mem[1] = _mm_shuffle_ps(mem[1], mem[1], 0x39);

      mem[1] = _mm_add_ps(mem[1], _mm_mul_ps(xx, num[1]));
      mem[1] = _mm_sub_ps(mem[1], _mm_mul_ps(yy, den[1]));

      mem[2] = _mm_shuffle_ps(mem[2], mem[2], 0xfd);

      mem[2] = _mm_add_ps(mem[2], _mm_mul_ps(xx, num[2]));
      mem[2] = _mm_sub_ps(mem[2], _mm_mul_ps(yy, den[2]));
   }
   /* Put memory back in its place */
   _mm_storeu_ps(_mem, mem[0]);
   _mm_storeu_ps(_mem+4, mem[1]);
   _mm_store_ss(_mem+8, mem[2]);
   mem[2] = _mm_shuffle_ps(mem[2], mem[2], 0x55);
   _mm_store_ss(_mem+9, mem[2]);
}

<p>        Jean-Marc


-- 
Jean-Marc Valin, M.Sc.A., ing. jr.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: Ceci est une partie de message numériquement signée.
Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20040116/444ce574/signature.pgp


More information about the Speex-dev mailing list