[speex-dev] [PATCH] Make SSE Run Time option. Add Win32 SSE code

Aron Rosenberg aron at sightspeed.com
Tue Jan 13 23:13:09 PST 2004



Jean-Marc,

 >I'm still not sure I get it. On an Athlon XP, I can do something like
 >"mulps xmm0, xmm1", which means that the xmm registers are indeed
 >supported. Besides, without the xmm registers, you can't use much of
 >SSE.

In the Atholon XP 2400+ that we have in our QA lab (Win2000 ) if you run 
that code it generates an Illegal Instruction Error. In addition, an AMD 
Duron (Windows ME) does the same thing. There are two possible reasons - 
One is that those processors do not support xmm registers or the Operating 
System does not support XMM registers. In the morning we will check the 
code on Windows XP. This may be a Windows specific thing, either way you 
still need to support non FP versions of the SSE set.

If you read through AMD's processor detection guide
         (PDF) 
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/20734.pdf

and go to section that shows the sample code for dealing with CPUID 
support. (Starts about Page 37) It talks about the FEATURE_SSEFP support 
which you have to query for. On the Atholon XP 2400+ that we have here, 
that code does not detect the presence of that when run under Windows. The 
same code on a Pentium 4 detects it just fine.

Here is an article which describes the K8 (Opteron and Atholon64) as 
including the XMM registers: 
http://sysopt.earthweb.com/articles/k8/index2.html . All the stuff I could 
google seems to indicate that XMM register support is not included in the 
current Atholon XP series or below.

With any machine you are not guaranteed to get support for the XMM 
registers (the 128 bit wide ones), since the OS has to support it as well.

Have you or anybody else successfully run the current SSE code on a Atholon 
XP system?

<p>>Actually, SSE also requires 16-byte alignment for most instructions
>(except movups, which is slow anyway). That's why I have those kludges
>with the pointer masks in the current code. I think we should find a
>general solution for the problem. Also, there's one place (inner_prod,
>called by the open-loop pirch estimator) where non 16-byte-aligned loads
>are really required. It's probably possible to work around that, but it
>might require 4 copies of the data (with 4-byte offsets).
Agreed, although the inner_prod isn't that big a deal since you can do 
clever vector swaps in Altivec to reduce the amount of shuffling needed. In 
our current Altivec version we have four blocks, dealing with when certain 
things are aligned and certain things aren't. Its ugly to read, but works 
quite nicely.

>I think the ALIGN macros I currently have should do the job. If it's
>possible to use them, the advantage is that they are
>platform-independent.
For the alignment part, my feeling is that the compiler generated way is 
better than a run-time cast. The compiler native code will not cross 
platform should generate much faster code since you don't have to perform 
the cast at run-time, which is what your ALIGN macros appear to be doing in 
stack-alloc.h.

One other thing we noticed is that you tend to do a lot of  for loop based 
copies:

from your new filters_sse.h around the asm code

   for (i=0;i<12;i++)
       num[i]=den[i]=0;
    for (i=0;i<12;i++)
       mem[i]=0;

    for (i=0;i<ord;i++)
    {
       num[i]=_num[i+1];
       den[i]=_den[i+1];
    }
    for (i=0;i<ord;i++)
       mem[i]=_mem[i];

<<< asm code>>>

    for (i=0;i<ord;i++)
       _mem[i]=mem[i];

<p>could easily be reduced to

memset(num,0,12);
memset(den,0,12);
memset(mem,0,12);
memcpy(num,_num+1,ord);
memcpy(den,_den+1,ord);
memcpy(mem,_mem+1,ord);

<<<asm code>>>

memcpy(_mem,mem,ord);

<p>Do you not like to use memcpy or memset? Or am I missing something like 
overlapping memory spaces?

Aron Rosenberg
SightSpeed 

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Speex-dev mailing list