[speex-dev] Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others
Aron Rosenberg
aron at sightspeed.com
Wed Jan 21 20:26:10 PST 2004
Here are our notes on 1.1.4 testing on Windows
1. Compile Error with regular mode (FIXED_POINT undefined) at lsp.c line 104
static inline spx_word16_t spx_cos(spx_word16_t x) . VS6 does not like
the inline keyword here. Removing it allows compiling.
same with cb_search_sse.h line 34.
2. Compile Error with quant_lsp.c line 55. M_PI is undefined. Either it
needs to be included in that file or placed in a header.
3. denoise.c doesn't seem to be in tar.gz, it is in the visual studio
project file though.
Now onto the actual SSE tests.
We ran the SSE intrinics code through some test on windows over here and
all I can say is - it sucks. A room filled with Monkeys could generate
better SSE code. Having stated that let me describe why.
We use Visual Studio 6, SP5 with the processor pack as the main development
platform. For some unknown reason, it decides that it only ever wants to
use XMM0 for its SSE operations. If it is dealing with a two paramater SSE
call, then it will use XMM1, but thats it. Between succesive calls, it
won't keep things in an xmm register, even if the next call is using it.
To check this, I converted some of the MMX code in our regular application
to intrinics and it does the same thing, only uses mm0 and mm1. It actually
runs slower than a c code version of the same function.
Now, this could be different on Visual Studio .NET and .NET 2003, but that
is what happens with Visual Studio 6. Just so you understand, I am pasting
below some of the generated SSE code for the fir_mem2_10 function. I got
this by compiling the speexenc and loading it up in the debugger.
Skipped a bit of the initial function stuff the block starts inside the for
loop. For those who don't know, Win32 asm is backwords from GCC, it
is OPERATION DEST, SOURCE
254: for (i=0;i<N;i++)
255: {
256: __m128 xx;
257: __m128 yy;
258: /* Compute next filter result */
259: xx = _mm_load_ps1(x+i);
00413483 mov eax,dword ptr [ebp-64h]
00413486 mov ecx,dword ptr [ebx+8]
00413489 lea edx,[ecx+eax*4]
0041348C movss xmm0,dword ptr [edx]
00413490 shufps xmm0,xmm0,0
00413494 movaps xmmword ptr [xx],xmm0
260: yy = _mm_add_ss(xx, mem[0]);
00413498 movaps xmm0,xmmword ptr [ebp-60h]
0041349C movaps xmm1,xmmword ptr [xx]
004134A0 addss xmm1,xmm0
004134A4 movaps xmmword ptr [yy],xmm1
261: _mm_store_ss(y+i, yy);
004134AB movaps xmm0,xmmword ptr [yy]
004134B2 mov eax,dword ptr [ebp-64h]
004134B5 mov ecx,dword ptr [ebx+10h]
004134B8 lea edx,[ecx+eax*4]
004134BB movss dword ptr [edx],xmm0
262: yy = _mm_shuffle_ps(yy, yy, 0);
004134BF movaps xmm0,xmmword ptr [yy]
004134C6 movaps xmm1,xmmword ptr [yy]
004134CD shufps xmm1,xmm0,0
004134D1 movaps xmmword ptr [yy],xmm1
263:
264: /* Update memory */
265: mem[0] = _mm_move_ss(mem[0], mem[1]);
004134D8 movaps xmm0,xmmword ptr [ebp-50h]
004134DC movaps xmm1,xmmword ptr [ebp-60h]
004134E0 movss xmm1,xmm0
004134E4 movaps xmmword ptr [ebp-60h],xmm1
266: mem[0] = _mm_shuffle_ps(mem[0], mem[0], 0x39);
004134E8 movaps xmm0,xmmword ptr [ebp-60h]
004134EC movaps xmm1,xmmword ptr [ebp-60h]
004134F0 shufps xmm1,xmm0,39h
004134F4 movaps xmmword ptr [ebp-60h],xmm1
267:
268: mem[0] = _mm_add_ps(mem[0], _mm_mul_ps(xx, num[0]));
004134F8 movaps xmm0,xmmword ptr [ebp-30h]
004134FC movaps xmm1,xmmword ptr [xx]
00413500 mulps xmm1,xmm0
00413503 movaps xmm0,xmmword ptr [ebp-60h]
00413507 addps xmm0,xmm1
0041350A movaps xmmword ptr [ebp-60h],xmm0
269:
270: mem[1] = _mm_move_ss(mem[1], mem[2]);
0041350E movaps xmm0,xmmword ptr [ebp-40h]
00413512 movaps xmm1,xmmword ptr [ebp-50h]
00413516 movss xmm1,xmm0
0041351A movaps xmmword ptr [ebp-50h],xmm1
271: mem[1] = _mm_shuffle_ps(mem[1], mem[1], 0x39);
0041351E movaps xmm0,xmmword ptr [ebp-50h]
00413522 movaps xmm1,xmmword ptr [ebp-50h]
00413526 shufps xmm1,xmm0,39h
0041352A movaps xmmword ptr [ebp-50h],xmm1
272:
273: mem[1] = _mm_add_ps(mem[1], _mm_mul_ps(xx, num[1]));
0041352E movaps xmm0,xmmword ptr [ebp-20h]
00413532 movaps xmm1,xmmword ptr [xx]
00413536 mulps xmm1,xmm0
00413539 movaps xmm0,xmmword ptr [ebp-50h]
0041353D addps xmm0,xmm1
00413540 movaps xmmword ptr [ebp-50h],xmm0
274:
275: mem[2] = _mm_shuffle_ps(mem[2], mem[2], 0xfd);
00413544 movaps xmm0,xmmword ptr [ebp-40h]
00413548 movaps xmm1,xmmword ptr [ebp-40h]
0041354C shufps xmm1,xmm0,0FDh
00413550 movaps xmmword ptr [ebp-40h],xmm1
276:
277: mem[2] = _mm_add_ps(mem[2], _mm_mul_ps(xx, num[2]));
00413554 movaps xmm0,xmmword ptr [ebp-10h]
00413558 movaps xmm1,xmmword ptr [xx]
0041355C mulps xmm1,xmm0
0041355F movaps xmm0,xmmword ptr [ebp-40h]
00413563 addps xmm0,xmm1
00413566 movaps xmmword ptr [ebp-40h],xmm0
278: }
<p>--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Speex-dev
mailing list