[speex-dev] [PATCH] Make SSE Run Time option. Add Win32 SSE code
Aron Rosenberg
aron at sightspeed.com
Tue Jan 13 23:13:09 PST 2004
Jean-Marc,
>I'm still not sure I get it. On an Athlon XP, I can do something like
>"mulps xmm0, xmm1", which means that the xmm registers are indeed
>supported. Besides, without the xmm registers, you can't use much of
>SSE.
In the Atholon XP 2400+ that we have in our QA lab (Win2000 ) if you run
that code it generates an Illegal Instruction Error. In addition, an AMD
Duron (Windows ME) does the same thing. There are two possible reasons -
One is that those processors do not support xmm registers or the Operating
System does not support XMM registers. In the morning we will check the
code on Windows XP. This may be a Windows specific thing, either way you
still need to support non FP versions of the SSE set.
If you read through AMD's processor detection guide
(PDF)
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/20734.pdf
and go to section that shows the sample code for dealing with CPUID
support. (Starts about Page 37) It talks about the FEATURE_SSEFP support
which you have to query for. On the Atholon XP 2400+ that we have here,
that code does not detect the presence of that when run under Windows. The
same code on a Pentium 4 detects it just fine.
Here is an article which describes the K8 (Opteron and Atholon64) as
including the XMM registers:
http://sysopt.earthweb.com/articles/k8/index2.html . All the stuff I could
google seems to indicate that XMM register support is not included in the
current Atholon XP series or below.
With any machine you are not guaranteed to get support for the XMM
registers (the 128 bit wide ones), since the OS has to support it as well.
Have you or anybody else successfully run the current SSE code on a Atholon
XP system?
<p>>Actually, SSE also requires 16-byte alignment for most instructions
>(except movups, which is slow anyway). That's why I have those kludges
>with the pointer masks in the current code. I think we should find a
>general solution for the problem. Also, there's one place (inner_prod,
>called by the open-loop pirch estimator) where non 16-byte-aligned loads
>are really required. It's probably possible to work around that, but it
>might require 4 copies of the data (with 4-byte offsets).
Agreed, although the inner_prod isn't that big a deal since you can do
clever vector swaps in Altivec to reduce the amount of shuffling needed. In
our current Altivec version we have four blocks, dealing with when certain
things are aligned and certain things aren't. Its ugly to read, but works
quite nicely.
>I think the ALIGN macros I currently have should do the job. If it's
>possible to use them, the advantage is that they are
>platform-independent.
For the alignment part, my feeling is that the compiler generated way is
better than a run-time cast. The compiler native code will not cross
platform should generate much faster code since you don't have to perform
the cast at run-time, which is what your ALIGN macros appear to be doing in
stack-alloc.h.
One other thing we noticed is that you tend to do a lot of for loop based
copies:
from your new filters_sse.h around the asm code
for (i=0;i<12;i++)
num[i]=den[i]=0;
for (i=0;i<12;i++)
mem[i]=0;
for (i=0;i<ord;i++)
{
num[i]=_num[i+1];
den[i]=_den[i+1];
}
for (i=0;i<ord;i++)
mem[i]=_mem[i];
<<< asm code>>>
for (i=0;i<ord;i++)
_mem[i]=mem[i];
<p>could easily be reduced to
memset(num,0,12);
memset(den,0,12);
memset(mem,0,12);
memcpy(num,_num+1,ord);
memcpy(den,_den+1,ord);
memcpy(mem,_mem+1,ord);
<<<asm code>>>
memcpy(_mem,mem,ord);
<p>Do you not like to use memcpy or memset? Or am I missing something like
overlapping memory spaces?
Aron Rosenberg
SightSpeed
<p>--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Speex-dev
mailing list