[speex-dev] [PATCH] Make SSE Run Time option.

Thu Jan 15 01:10:54 PST 2004

On Thu, 15 Jan 2004, Aron Rosenberg wrote:

> So we ran the code on a Windows XP based Atholon XP system and the xmm
> registers work just fine so it appears that Windows 2000 and below does not
> support them.
>
> We agree on not supporting the non-FP version, however the run time flags
> need to be settable with a non FP SSE mode so that exceptions are avoided.
>
> I thus propose a set of defines like this instead of the ones in our
> initial patch:
>
> #define CPU_MODE_NONE     0
> #define CPU_MODE_MMX      1   // Base Intel MMX x86
> #define CPU_MODE_3DNOW    2 // Base AMD 3Dnow extensions
> #define CPU_MODE_SSE      4 // Intel Integer SSE instructions
> #define CPU_MODE_3DNOWEXT 8 // AMD 3Dnow extended instructions
> #define CPU_MODE_SSEFP 16 // SSE FP modes, mainly support for xmm registers
> #define CPU_MODE_SSE2     32 // Intel SSE2 instructions
> #define CPU_MODE_ALTIVEC  64 // PowerPC Altivec support.

You may wish to save space for PNI.

        http://cedar.intel.com/media/pdf/PNI_LEGAL3.pdf

Likewise, all that branching is probably going to cause more trouble than
it saves.  Try this:

        vector float a0 = vec_ld( 0, a );
        vector float a1 = vec_ld( 15, a );
        vector float b0 = vec_ld( 0, b );
        vector float b1 = vec_ld( 15, b );

        a0 = vec_perm( a0, a1, vec_lvsl( 0, a ) );
        b0 = vec_perm( b0, b1, vec_lvsl( 0, b ) );

        a0 = vec_madd( a0, b0, (vector float) vec_splat_u32(0) ) ;
        a0 = vec_add( a0, vec_sld( a0, a0, 8 ) );
        a0 = vec_add( a0, vec_sld( a0, a0, 4 ) );

        vec_ste( a0, 0, &sum );
        return sum;

Please note that dot products of simple vector floats are usually faster
in the scalar units. The add across and transfer to scalar is just too
expensive. Its generally only worthwhile if the data starts and ends in
the vector units, and it is inlined so that latencies can be covered with
other work. e.g:

        inline vector float DotProduct( vector float a, vector float b )
        {
                a = vec_madd( a, b, (vector float) vec_splat_u32(0) ) ;
                a = vec_add( a, vec_sld( a, a, 8 ) );
                a = vec_add( a, vec_sld( a, a, 4 ) );
                return a;
        }

Ian
---------------------------------------------------
   Ian Ollmann, Ph.D.       iano at cco.caltech.edu
---------------------------------------------------

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.