[speex-dev] [PATCH] Make SSE Run Time option.
Jean-Marc.Valin at USherbrooke.ca
Thu Jan 15 01:29:52 PST 2004
> You may wish to save space for PNI.
Seems to be interesting instructions for complex arithmetic there (thus
helping FFTs). I'm not sure there's anything useful for Speex, though.
We'll see. What I think is much more promising is the x86-64 version of
SSE with 16 registers. That could speed up the filters a lot.
> Please note that dot products of simple vector floats are usually faster
> in the scalar units. The add across and transfer to scalar is just too
> expensive. Its generally only worthwhile if the data starts and ends in
> the vector units, and it is inlined so that latencies can be covered with
> other work. e.g:
Actually, even with a scalar unit, the best code is implicitly
vectorized. If you look at the original code I had, there are 4 partial
sums that prevents some stalling due to dependencies. From there, it's
easy to vectorize by 4 and add at the end. Note that for Speex the
vectors are either 40 or 160 samples long. The whole process is also
repeated 128 times in a row, so I think a vector unit will do much
Jean-Marc Valin, M.Sc.A., ing. jr.
Université de Sherbrooke, Québec, Canada
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 190 bytes
Desc: Ceci est une partie de message numériquement signée.
Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20040115/31dba8ec/signature-0001.pgp
More information about the Speex-dev