[speex-dev] [PATCH] Make SSE Run Time option.

Ian Ollmann iano at cco.caltech.edu
Thu Jan 15 19:24:05 PST 2004



On Thu, 15 Jan 2004, Jean-Marc Valin wrote:

> Actually, I'm not denying you can do pretty fast complex multiplies by
> separating real from imaginary. What I'm saying is that with addsubps,
> you can do a better job when you have the complex numbers packed, then
> you can do with SSE1 only. I still think AMD got it better with its
> pfpnacc instruction and Intel should have gone much further.

I find it amazing that they would spend effort introducing new hardware
designed to facilitate programming in inefficient ways. The existence of
the instruction encourages people to use that data layout, thereby
shooting themselves in the foot. Furthermore, if they are going to help,
they could at least do so intelligently. The addsubps instruction would
have saved two permutes if it added across rather than vertically.

The way PNI is right now, given a strategy of hobbling ourselves with an
interleaved data layout, we arrive at the following implementation:

        Cr = Ar * Br - Ai * Bi
        Ci = Ai * Br + Ar * Bi

In vector notation:

        C = A * Br +- swap(A) * Bi

which comes to three permutes and three ALU instructions. Given a dispatch
limitation of one SIMD instruction / cycle (everything goes through port 1
as I understand page 1-17 of the Intel Pentium 4 / Xeon Processor
Optimization manual to say), it appears to me that you could do equally
well doing this without permutes using the scalar SSE2 instructions or
maybe even x87, because all we really need to accomplish here is 6 scalar
ALU ops!

If addsubps did its thing horizontally, then we could write it this way:

        real = A * B
        imag = swap(A) * B
        result = { sub_across( real ), add_across( imag ) }

which is three ALU operations and 1 permute. There is some chance that the
SSE2 implementation would beat double precision scalar code!*

Ian

*provided Intel processed more than one double per cycle for packed SSE2
instructions, which it does not.

---------------------------------------------------
   Ian Ollmann, Ph.D.       iano at cco.caltech.edu
---------------------------------------------------

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Speex-dev mailing list