Prev: A new way of looking at files (Future windows/file explorer ?)"Tag" based file browsing...
Next: How to Choose a Tech Support Service Provider
From: Piotr Wyderski on 23 Feb 2010 08:59 Hello, how exactly do SSE2 functional units operate in mixed data type mode on modern processors, i.e. Core2+/Phenom? For instance, it is much more convinient to use pshufd instead of shufps to shuffle single-precision floating point data vectors, as it saves one movaps instruction, but should I expect a penalty for crossing the floating-point/integer boundary? If yes, then how big? Best regards Piotr Wyderski
From: Terje Mathisen "terje.mathisen at on 23 Feb 2010 10:32 Piotr Wyderski wrote: > Hello, > > how exactly do SSE2 functional units operate in mixed data type mode > on modern processors, i.e. Core2+/Phenom? For instance, it is much more > convinient to use pshufd instead of shufps to shuffle single-precision > floating > point data vectors, as it saves one movaps instruction, but should I expect > a penalty for crossing the floating-point/integer boundary? If yes, then > how big? Afaik all implementations up to now have used the same storage for both types, so there hasn't been any penalty, so far. (I might be wrong though.) However, the fact that Intel/AMD have implemented separate opcodes for these instructions, even when the effect is identical, seems to indicate that they expect they will need to separate them at some point in the future, even if they haven't done so by now. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: Piotr Wyderski on 23 Feb 2010 11:13 Terje Mathisen wrote: > Afaik all implementations up to now have used the same storage for both Yes, the storage is the same, but I wonder if the XMM registers are just dynamic aliases to internal int_XMM and float_XMM sets and thus mixing them is wrong. > so there hasn't been any penalty My tests confirm that, but it's always better to ask. :-) Best regards Piotr Wyderski
From: Niels Fröhling on 24 Feb 2010 12:33 Terje Mathisen wrote: > Piotr Wyderski wrote: >> Hello, >> >> how exactly do SSE2 functional units operate in mixed data type mode >> on modern processors, i.e. Core2+/Phenom? For instance, it is much more >> convinient to use pshufd instead of shufps to shuffle single-precision >> floating >> point data vectors, as it saves one movaps instruction, but should I >> expect >> a penalty for crossing the floating-point/integer boundary? If yes, then >> how big? > > Afaik all implementations up to now have used the same storage for both > types, so there hasn't been any penalty, so far. (I might be wrong though.) > > However, the fact that Intel/AMD have implemented separate opcodes for > these instructions, even when the effect is identical, seems to indicate > that they expect they will need to separate them at some point in the > future, even if they haven't done so by now. As I see it it's the reverse. Those functions (pshufw) where MMX function which where pure integer. The other function (shufps) was a XMMX function which was pure floating-point. There was no way to mix-and-match. When AMD was offering 3DNow, mix-and-match of int/float was _intended_. Some functions (pswapd) where introduced to help float-movement put had a integer (pswap"d") identifier. When Intel was 5 years late to the party with SSE2 they mapped all MMX instruction onto XMMX registers which created a multitude of identically behaving op-codes. Mix-and-match is intended and has severe (positive) performance implications. Nobody will split this in the future again. My personal opinion about the why (not mapping the pshufd mnemonics on the shufps opcode) is, that you can make a processor which has no floating-point support at all simply removing all floatingpoint-implied functions, making the instruction decoder easier and so on. Ciao Niels
From: Piotr Wyderski on 1 Mar 2010 03:09
Terje Mathisen wrote: > However, the fact that Intel/AMD have implemented separate opcodes for > these instructions, even when the effect is identical, seems to indicate > that they expect they will need to separate them at some point in the > future, even if they haven't done so by now. Intel seems to recommend mixed-mode calculations, as they use it in their dot product code here: http://www.intel.com/technology/itj/2008/v12i3/3-paper/6-examples.htm haddps xmm0, xmm0 movaps xmm1, xmm0 psrlq xmm0, 32 So IMHO mixing can be considered blessed :-) Best regards, Piotr Wyderski |