[speex-dev] [PATCH] Make SSE Run Time option.

Thu Jan 15 00:45:43 PST 2004

So we ran the code on a Windows XP based Atholon XP system and the xmm 
registers work just fine so it appears that Windows 2000 and below does not 
support them.

We agree on not supporting the non-FP version, however the run time flags 
need to be settable with a non FP SSE mode so that exceptions are avoided.

I thus propose a set of defines like this instead of the ones in our 
initial patch:

#define CPU_MODE_NONE     0
#define CPU_MODE_MMX      1   // Base Intel MMX x86
#define CPU_MODE_3DNOW    2 // Base AMD 3Dnow extensions
#define CPU_MODE_SSE      4 // Intel Integer SSE instructions
#define CPU_MODE_3DNOWEXT 8 // AMD 3Dnow extended instructions
#define CPU_MODE_SSEFP 16 // SSE FP modes, mainly support for xmm registers
#define CPU_MODE_SSE2     32 // Intel SSE2 instructions
#define CPU_MODE_ALTIVEC  64 // PowerPC Altivec support.

Potential Additions include some of the ASM modes.

With the results that we found there is a relationship that looks like this:

3DNOW implies MMX. 3DNOWEXT implies SSE. SSE2 implies SSEFP. SSEFP implies 
SSE. Either way, all the current Speex SSE should be flag checked against 
SSEFP.

>Do you already have that implemented? I know it's possible, but the code
>will likely be really ugly.

We already have it implemented for the inner_prod function. After it is 
stable and fully tested, we will send you a patch. If you have never done 
Altivec coding it is quite simple since it is all C Macro's / functions. 
Not nearly as nasty as inline asm code, although the 16 byte alignment 
issues can be quite a pain. Our current working code is below:

Aron Rosenberg
SightSpeed Inc.

<p>static float inner_prod(float *a, float *b, int len)
{
         if (!(global_use_mmx_sse & CPU_MODE_ALTIVEC ))
         {
#ifdef _USE_ALTIVEC
         int i;
         float sum;

         int a_aligned = (((unsigned long)a) & 15) ? 0 : 1;
         int b_aligned = (((unsigned long)b) & 15) ? 0 : 1;

         __vector float MSQa, LSQa, MSQb, LSQb;
         __vector unsigned char maska, maskb;
         __vector float vec_a, vec_b;
         __vector float vec_result;

         vec_result = (__vector float)vec_splat_u8(0);

         if ((!a_aligned) && (!b_aligned)) {
             // This (unfortunately) is the common case.
             maska = vec_lvsl(0, a);
             maskb = vec_lvsl(0, b);

             MSQa = vec_ld(0, a);
             MSQb = vec_ld(0, b);

             for (i = 0; i < len; i+=8) {

                 a += 4;
                 LSQa = vec_ld(0, a);
                 vec_a = vec_perm(MSQa, LSQa, maska);

                 b += 4;
                 LSQb = vec_ld(0, b);
                 vec_b = vec_perm(MSQb, LSQb, maskb);

                 vec_result = vec_madd(vec_a, vec_b, vec_result);

                 a += 4;
                 MSQa = vec_ld(0, a);
                 vec_a = vec_perm(LSQa, MSQa, maska);

                 b += 4;
                 MSQb = vec_ld(0, b);
                 vec_b = vec_perm(LSQb, MSQb, maskb);

                 vec_result = vec_madd(vec_a, vec_b, vec_result);

             }
         } else if (a_aligned && b_aligned) {

             for (i = 0; i < len; i+=8) {
                 vec_a = vec_ld(0, a);
                 vec_b = vec_ld(0, b);
                 vec_result = vec_madd(vec_a, vec_b, vec_result);
                 a += 4; b += 4;
                 vec_a = vec_ld(0, a);
                 vec_b = vec_ld(0, b);
                 vec_result = vec_madd(vec_a, vec_b, 
vec_result);
                 a += 4; b += 4;
             }

         } else if (a_aligned) {
             maskb = vec_lvsl(0, b);
             MSQb = vec_ld(0, b);

             for (i = 0; i < len; i+=8) {

                 vec_a = vec_ld(0, a);
                 a += 4;

                 b += 4;
                 LSQb = vec_ld(0, b);
                 vec_b = vec_perm(MSQb, LSQb, maskb);

                 vec_result = vec_madd(vec_a, vec_b, vec_result);

                 vec_a = vec_ld(0, a);
                 a += 4;

                 b += 4;
                 MSQb = vec_ld(0, b);
                 vec_b = vec_perm(LSQb, MSQb, maskb);

                 vec_result = vec_madd(vec_a, vec_b, vec_result);
             }
         } else if (b_aligned) {
             maska = vec_lvsl(0, a);
             MSQa = vec_ld(0, a);

             for (i = 0; i < len; i+=8) {

                 a += 4;
                 LSQa = vec_ld(0, a);
                 vec_a = vec_perm(MSQa, LSQa, maska);

                 vec_b = vec_ld(0, b);
                 b += 4;

                 vec_result = vec_madd(vec_a, vec_b, vec_result);

                 a += 4;
                 MSQa = vec_ld(0, a);
                 vec_a = vec_perm(LSQa, MSQa, maska);

                 vec_b = vec_ld(0, b);
                 b += 4;

                 vec_result = vec_madd(vec_a, vec_b, vec_result);
             }
         }

         vec_result = vec_add(vec_result, vec_sld(vec_result, vec_result, 8));
         vec_result = vec_add(vec_result, vec_sld(vec_result, vec_result, 4));
         vec_ste(vec_result, 0, &sum);

         return sum; 

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.