Prev: [PATCH 1/1] PM: Thaws refrigerated and to be exited kernel threads
Next: drm/ksm -> s2disk -> resume -> [drm:r100_ring_test] *ERROR* radeon: ring test failed (sracth(0x15E4)=0xCAFEDEAD)
From: H. Peter Anvin on 10 Nov 2009 15:40 On 11/10/2009 12:16 PM, Willy Tarreau wrote: > > Indeed, but there is a difference between [cmpxchg, bswap, cmov, nopl] > on one side and [sse*] on the other : distros are built assuming the > former are always available while they are not always. And the distro > which make the difference have to provide an dedicated build for earlier > systems just for compatibility. SSE*, 3dnow* etc... are only used by a > handful of media players/converters/encoders which are able to detect > themselves what to use and already have the necessary fallbacks because > these instruction sets vary too much between processors and vendors. > That is increasingly not true since gcc is now doing autovectorization. > One could argue that cmpxchg/bswap/xadd are supported by 486 and that > implementing them for 386 is almost useless now (though it costs almost > nothing to provide them, I did a few years ago). > > CMOV/NOPL are rarely used, thus have no reason to cause a massive > performance drop, but are frequent enough (at least cmov) for almost > any program to have at least one or two inside, making it incompatible > with a given processor, and are almost obvious to implement too. I could 970 cmovs in libc out of 322660 instructions. That is one in 333 instruction. In other words, a trap-and-emulate of some 500 cycles would add some two cycles *per instruction* during execution -- hardly an insignificant number. All in all, any of this is really only useful as a limp. > SSE*/3dnow* would be much much harder and would only serve very few > programs, and serve them badly because when they're used, it would > be intensive. > > I personally am not against being able to emulate every optional > instruction, quite the opposite instead. It's just that if in order > to do this, we add cost to the other obvious ones, we lose what we > expected to win (simplicity and efficiency). I don't see any particular subset as being more obvious than the other, with the *possible* exception of NOPL, simply because NOPL was undocumented for so long. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Pavel Machek on 10 Nov 2009 16:00 Hi! > Indeed, but there is a difference between [cmpxchg, bswap, cmov, nopl] > on one side and [sse*] on the other : distros are built assuming the > former are always available while they are not always. And the > distro Well, fix the distros... > which make the difference have to provide an dedicated build for earlier > systems just for compatibility. SSE*, 3dnow* etc... are only used by a > handful of media players/converters/encoders which are able to detect > themselves what to use and already have the necessary fallbacks because > these instruction sets vary too much between processors and vendors. > > One could argue that cmpxchg/bswap/xadd are supported by 486 and that > implementing them for 386 is almost useless now (though it costs almost > nothing to provide them, I did a few years ago). > > CMOV/NOPL are rarely used, thus have no reason to cause a massive > performance drop, but are frequent enough (at least cmov) for almost *One* CMOV in the inner loop will make your performance go down 20x. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Willy Tarreau on 10 Nov 2009 16:20 On Tue, Nov 10, 2009 at 09:54:45PM +0100, Pavel Machek wrote: > Hi! > > > Indeed, but there is a difference between [cmpxchg, bswap, cmov, nopl] > > on one side and [sse*] on the other : distros are built assuming the > > former are always available while they are not always. And the > > distro > > Well, fix the distros... you know like me that it's as easy as useless to point the finger at distros, because people running on low end want something that works and people running on high end want something that runs fast. In order to satisfy every one, you would have to build with optimizations for every CPU around, which does not make sense. Simply count the number of CPU variants in the kernel, and imagine that many CDs/DVDs for a single platform distro. However, targetting the most common denominator of high end machines (basically i686) and having the lower end systems experience a tiny slowdown is not stupid at all since performance is not what matters the most there. The higher end systems will simply be able to run CPU-specific optimizations per-program as they already do right now. (...) > > CMOV/NOPL are rarely used, thus have no reason to cause a massive > > performance drop, but are frequent enough (at least cmov) for almost > > *One* CMOV in the inner loop will make your performance go down 20x. yes, just like with emulated FPU or trapped unaligned accesses. It's just like flying fishes. They exist but they aren't the most common ones. If people encounter these cases on a specific program, then they just have to recompile it if it is a problem. At least they don't rebuild the whole distro. And once again, I've been using cmpxchg/bswap emulation for years on my i386 without feeling any need for a rebuild, and CMOV emulation for years now on my mini-itx C3 without any problem either. These are real experiences, not just fears of imaginary problems. Yes I can design a program to run 400 times slower on these machines if I want. I just don't feel the need to do so and apparently existing programs' authors didn't either. Regards, Willy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on 10 Nov 2009 16:30 On 11/10/2009 01:12 PM, Willy Tarreau wrote: > yes, just like with emulated FPU or trapped unaligned accesses. It's > just like flying fishes. They exist but they aren't the most common > ones. If people encounter these cases on a specific program, then > they just have to recompile it if it is a problem. At least they > don't rebuild the whole distro. And once again, I've been using > cmpxchg/bswap emulation for years on my i386 without feeling any > need for a rebuild, and CMOV emulation for years now on my mini-itx > C3 without any problem either. These are real experiences, not just > fears of imaginary problems. Yes I can design a program to run 400 > times slower on these machines if I want. I just don't feel the need > to do so and apparently existing programs' authors didn't either. Willy, perhaps you can come up with a list of features you think should be emulated, together with an explanation of why you opted for that list of features and *did not* opt for others. Note: emulated FPU is a special subcase. The FPU operations are heavyweight enough that the overhead of trapping versus library calls is relatively insignificant. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Matt Thrailkill on 10 Nov 2009 16:30
On Tue, Nov 10, 2009 at 12:54 PM, Pavel Machek <pavel(a)ucw.cz> wrote: > *One* CMOV in the inner loop will make your performance go down 20x. This is 20x slower than not running at all, right? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |