[speex-dev] [PATCH] Make SSE Run Time option. Add Win32 SSE code

Jean-Marc Valin Jean-Marc.Valin at USherbrooke.ca
Tue Jan 13 21:10:52 PST 2004


>          There is a big difference between SSE and SSEFP. The SSEFP means 
> that the CPU supports the xmm registers. All Intel chips with SSE support 
> do, however no current 32 bit AMD chips support the XMM registers. They 
> will support the SSE instructions but not those registers. You are right 
> about the SSE2 not being used.

I'm still not sure I get it. On an Athlon XP, I can do something like
"mulps xmm0, xmm1", which means that the xmm registers are indeed
supported. Besides, without the xmm registers, you can't use much of
SSE. You can use the prefetch instructions that were in the Athlon
T-Bird, but that's about it (and I don't think that makes it SSE1).

> The AMD Opterons are the first AMD CPU's which support xmm registers. They 
> will have 16 of them while the current Pentium 3's and above have only 8.

Athlon XP's do. ...unless we have a different idea of what an xmm
register is.

> Sorry about the patch having those push pops commented out, they should be 
> in there.
> 
> If you check your new code into CVS we can do all the converting needed. We 
> are working on an Altivec version right now based on the current code, but 
> if you have new code that makes it easier for us since we won't have to 
> port it twice.

Fine, I'll put it in the 1.1.x branch though because it's still
experimental and very unclean in some parts (alignment, forced order 10
even when we need 8). In the mean time, I'm attaching my modified
version of filter_mem2. The modifs I made removed the need for unaligned
moves and could also be applied to fir_mem2 and iir_mem2.

> One major thing to note - In Altivec everything needs to be 16 byte aligned 
> for it to work efficiently. A number of the starting points right now are 
> only 4 byte aligned. If you can add the following macro to the variables 
> that get passed in, it will make everything easier. Use it as such:

Actually, SSE also requires 16-byte alignment for most instructions
(except movups, which is slow anyway). That's why I have those kludges
with the pointer masks in the current code. I think we should find a
general solution for the problem. Also, there's one place (inner_prod,
called by the open-loop pirch estimator) where non 16-byte-aligned loads
are really required. It's probably possible to work around that, but it
might require 4 copies of the data (with 4-byte offsets).

>          ALIGN(16) unsigned int myVar;
> or
>          static ALIGN(16) float myArray[16];

I think the ALIGN macros I currently have should do the job. If it's
possible to use them, the advantage is that they are
platform-independent.

        Jean-Marc


-- 
Jean-Marc Valin, M.Sc.A., ing. jr.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada


-------------- next part --------------
A non-text attachment was scrubbed...
Name: filters_sse.h__charset_ISO-8859-1
Type: text/x-c-header
Size: 3628 bytes
Desc: filters_sse.h__charset_ISO-8859-1
Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20040114/7a9bfc3e/filters_sse.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: Ceci est une partie de message numériquement signée.
Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20040114/7a9bfc3e/signature.pgp


More information about the Speex-dev mailing list