[speex-dev] [PATCH] Make SSE Run Time option.

Tom Harper tharper at sightspeed.com
Thu Jan 15 11:43:59 PST 2004



Hi Jean Marc,

I think there is just a confusion over terminology going on here- I agree that
support for 3dnow base version may not necessarily be relevant; However,
even though 3dNow extended is a bastardized version of SSE, it still supports
the same instructions, and that is what is important- I don't think we 
intend to
add any AMD specfic code.

The real issue is cross CPU SSE support, and whether in addition there is 
access
to XMM registers or not- whether the OS actually supports XMM as well.  We 
have
a fair amount of other stuff we do in assembler, much of which requires SSE 
instruction
sets but *not* XMM registers, and some of which is just MMX only.   In 
speex, I can see
how you would always want to use the widest register possible for all of 
the fp ops in
longish vectors.  However, the more integer stuff  you do, and just as time 
goes on, the
more likely it is someone will want to do some type of optimization to 
accommodate
lowest common denominator type machines.

So I guess what I am saying is that it would be good to have at minimum:

NONE
MMX (lcd type machines)
SSE (i.e. 3dnow ext with no OS XMM support)
SSE_XMM (i.e. 3dnow ext, sse, with OS XMM support)
ALTIVEC

my 2p,
Tom

At 01:09 AM 1/15/2004, Jean-Marc Valin wrote:
> > We agree on not supporting the non-FP version, however the run time flags
> > need to be settable with a non FP SSE mode so that exceptions are avoided.
>
>I think we should keep the more "official" naming and not AMD's, which
>is more confusing. SSE means SSE1: all the SSE instructions (including
>the ones using xmm registers). What AMD calls SSE is not SSE at all.
>Basically, it's a bunch of "extra instructions" borrowed from SSE and
>that are part of the extended 3DNow!.
>
> > I thus propose a set of defines like this instead of the ones in our
> > initial patch:
> >
> > #define CPU_MODE_NONE     0
> > #define CPU_MODE_MMX      1   // Base Intel MMX x86
> > #define CPU_MODE_3DNOW    2 // Base AMD 3Dnow extensions
> > #define CPU_MODE_SSE      4 // Intel Integer SSE instructions
> > #define CPU_MODE_3DNOWEXT 8 // AMD 3Dnow extended instructions
> > #define CPU_MODE_SSEFP 16 // SSE FP modes, mainly support for xmm registers
> > #define CPU_MODE_SSE2     32 // Intel SSE2 instructions
> > #define CPU_MODE_ALTIVEC  64 // PowerPC Altivec support.
>
>If you reall want to define stuff like that, you could have simply
>NONE
>MMX
>3DNOW
>3DNOWEXT
>SSE1
>SSE2
>ALTIVEC
>
>Even then, MMX is completely useless for Speex IMO and I doubt it's
>worth writing 3DNow non-ext code (or even 3DNow! at all). Same for SSE2:
>Speex simply doesn't use doubles at all. That's why i think only
>defining NONE, SSE and ALTIVEC (maybe 3DNow?) would be enough.
>
> > We already have it implemented for the inner_prod function. After it is
> > stable and fully tested, we will send you a patch. If you have never done
> > Altivec coding it is quite simple since it is all C Macro's / functions.
> > Not nearly as nasty as inline asm code, although the 16 byte alignment
> > issues can be quite a pain. Our current working code is below:
>
>You can do the same with SSE intrinsics. I just got used to writing
>assembly before they were available for gcc. I had a quick look at your
>inner_prod implementation. I think that if you really want to make that
>fast (there's a big possible gain there), you need to consider the
>optimization at a higher level: from open_loop_nbest_pitch. The function
>calls inner_prod for a continuous range of offsets. With that in mind,
>it would probably be simpler to just take 4 copies (with different
>offsets) of one of the vectors and then compute everything with simple,
>aligned loads.
>
>         Jean-Marc
>
>--
>Jean-Marc Valin, M.Sc.A., ing. jr.
>LABORIUS (http://www.gel.usherb.ca/laborius)
>Université de Sherbrooke, Québec, Canada


-- 

Tom Harper - tharper at sightspeed.com
Lead Software Engineer
SightSpeed - A Roda Group Affiliated Company

918 Parker St, Suite A14
Berkeley, CA 94710

Phone: 510.665.2920
Cell: 415.378.3779

http://www.sightspeed.com

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.




More information about the Speex-dev mailing list