[speex-dev] Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

Aron Rosenberg aron at sightspeed.com
Wed Jan 21 20:26:10 PST 2004



Here are our notes on 1.1.4 testing on Windows

1. Compile Error with regular mode (FIXED_POINT undefined) at lsp.c line 104
        static inline spx_word16_t spx_cos(spx_word16_t x)    . VS6 does not like 
the inline keyword here. Removing it allows compiling.

    same with cb_search_sse.h  line 34.

2. Compile Error with quant_lsp.c  line 55.  M_PI is undefined. Either it 
needs to be included in that file or placed in a header.

3. denoise.c doesn't seem to be in tar.gz, it is in the visual studio 
project file though.

Now onto the actual SSE tests.

We ran the SSE intrinics code through some test on windows over here and 
all I can say is - it sucks. A room filled with Monkeys could generate 
better SSE code. Having stated that let me describe why.

We use Visual Studio 6, SP5 with the processor pack as the main development 
platform. For some unknown reason, it decides that it only ever wants to 
use XMM0 for its SSE operations. If it is dealing with a two paramater SSE 
call, then it will use XMM1, but thats it. Between succesive calls, it 
won't keep things in an xmm register, even if the next call is using it.

To check this, I converted some of the MMX code in our regular application 
to intrinics and it does the same thing, only uses mm0 and mm1. It actually 
runs slower than a c code version of the same function.

Now, this could be different on Visual Studio .NET and .NET 2003, but that 
is what happens with Visual Studio 6. Just so you understand, I am pasting 
below some of the generated SSE code for the fir_mem2_10 function. I got 
this by compiling the speexenc and loading it up in the debugger.

Skipped a bit of the initial function stuff the block starts inside the for 
loop. For those who don't know, Win32 asm is backwords from GCC, it 
is    OPERATION DEST, SOURCE

254:     for (i=0;i<N;i++)
255:     {
256:        __m128 xx;
257:        __m128 yy;
258:        /* Compute next filter result */
259:        xx = _mm_load_ps1(x+i);
00413483   mov         eax,dword ptr [ebp-64h]
00413486   mov         ecx,dword ptr [ebx+8]
00413489   lea         edx,[ecx+eax*4]
0041348C   movss       xmm0,dword ptr [edx]
00413490   shufps      xmm0,xmm0,0
00413494   movaps      xmmword ptr [xx],xmm0
260:        yy = _mm_add_ss(xx, mem[0]);
00413498   movaps      xmm0,xmmword ptr [ebp-60h]
0041349C   movaps      xmm1,xmmword ptr [xx]
004134A0   addss       xmm1,xmm0
004134A4   movaps      xmmword ptr [yy],xmm1
261:        _mm_store_ss(y+i, yy);
004134AB   movaps      xmm0,xmmword ptr [yy]
004134B2   mov         eax,dword ptr [ebp-64h]
004134B5   mov         ecx,dword ptr [ebx+10h]
004134B8   lea         edx,[ecx+eax*4]
004134BB   movss       dword ptr [edx],xmm0
262:        yy = _mm_shuffle_ps(yy, yy, 0);
004134BF   movaps      xmm0,xmmword ptr [yy]
004134C6   movaps      xmm1,xmmword ptr [yy]
004134CD   shufps      xmm1,xmm0,0
004134D1   movaps      xmmword ptr [yy],xmm1
263:
264:        /* Update memory */
265:        mem[0] = _mm_move_ss(mem[0], mem[1]);
004134D8   movaps      xmm0,xmmword ptr [ebp-50h]
004134DC   movaps      xmm1,xmmword ptr [ebp-60h]
004134E0   movss       xmm1,xmm0
004134E4   movaps      xmmword ptr [ebp-60h],xmm1
266:        mem[0] = _mm_shuffle_ps(mem[0], mem[0], 0x39);
004134E8   movaps      xmm0,xmmword ptr [ebp-60h]
004134EC   movaps      xmm1,xmmword ptr [ebp-60h]
004134F0   shufps      xmm1,xmm0,39h
004134F4   movaps      xmmword ptr [ebp-60h],xmm1
267:
268:        mem[0] = _mm_add_ps(mem[0], _mm_mul_ps(xx, num[0]));
004134F8   movaps      xmm0,xmmword ptr [ebp-30h]
004134FC   movaps      xmm1,xmmword ptr [xx]
00413500   mulps       xmm1,xmm0
00413503   movaps      xmm0,xmmword ptr [ebp-60h]
00413507   addps       xmm0,xmm1
0041350A   movaps      xmmword ptr [ebp-60h],xmm0
269:
270:        mem[1] = _mm_move_ss(mem[1], mem[2]);
0041350E   movaps      xmm0,xmmword ptr [ebp-40h]
00413512   movaps      xmm1,xmmword ptr [ebp-50h]
00413516   movss       xmm1,xmm0
0041351A   movaps      xmmword ptr [ebp-50h],xmm1
271:        mem[1] = _mm_shuffle_ps(mem[1], mem[1], 0x39);
0041351E   movaps      xmm0,xmmword ptr [ebp-50h]
00413522   movaps      xmm1,xmmword ptr [ebp-50h]
00413526   shufps      xmm1,xmm0,39h
0041352A   movaps      xmmword ptr [ebp-50h],xmm1
272:
273:        mem[1] = _mm_add_ps(mem[1], _mm_mul_ps(xx, num[1]));
0041352E   movaps      xmm0,xmmword ptr [ebp-20h]
00413532   movaps      xmm1,xmmword ptr [xx]
00413536   mulps       xmm1,xmm0
00413539   movaps      xmm0,xmmword ptr [ebp-50h]
0041353D   addps       xmm0,xmm1
00413540   movaps      xmmword ptr [ebp-50h],xmm0
274:
275:        mem[2] = _mm_shuffle_ps(mem[2], mem[2], 0xfd);
00413544   movaps      xmm0,xmmword ptr [ebp-40h]
00413548   movaps      xmm1,xmmword ptr [ebp-40h]
0041354C   shufps      xmm1,xmm0,0FDh
00413550   movaps      xmmword ptr [ebp-40h],xmm1
276:
277:        mem[2] = _mm_add_ps(mem[2], _mm_mul_ps(xx, num[2]));
00413554   movaps      xmm0,xmmword ptr [ebp-10h]
00413558   movaps      xmm1,xmmword ptr [xx]
0041355C   mulps       xmm1,xmm0
0041355F   movaps      xmm0,xmmword ptr [ebp-40h]
00413563   addps       xmm0,xmm1
00413566   movaps      xmmword ptr [ebp-40h],xmm0
278:     }

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Speex-dev mailing list