Prev: [PATCH 1/1] PM: Thaws refrigerated and to be exited kernel threads
Next: drm/ksm -> s2disk -> resume -> [drm:r100_ring_test] *ERROR* radeon: ring test failed (sracth(0x15E4)=0xCAFEDEAD)
From: H. Peter Anvin on 10 Nov 2009 14:00 On 11/10/2009 09:24 AM, Alan Cox wrote: >> >> In the short term, yes, of course. However, if we're going to do >> emulation, we might as well do it right. > > Why is using KVM doing it right ? It sounds like its doing it slowly, > and hideously memory inefficiently. You are solving an uninteresting > general case problem when you just need two tiny fixups (or perhaps 3 if > you want to fix up early x86-64 prefetch) Why do we only need "two tiny fixups"? Where do we draw the line in terms of ISA compatibility? One could easily argue that the Right Thing[TM] is to be able to process any optional instruction -- otherwise one has a very difficult place to draw a line. Consider SSE3, for example. Why should the same concept not apply to SSE3 instructions as to CMOV? -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on 10 Nov 2009 15:00 On 11/10/2009 08:49 PM, H. Peter Anvin wrote: > >> Why is using KVM doing it right ? It sounds like its doing it slowly, >> and hideously memory inefficiently. You are solving an uninteresting >> general case problem when you just need two tiny fixups (or perhaps 3 if >> you want to fix up early x86-64 prefetch) >> > Why do we only need "two tiny fixups"? Where do we draw the line in > terms of ISA compatibility? One could easily argue that the Right > Thing[TM] is to be able to process any optional instruction -- otherwise > one has a very difficult place to draw a line. > > Consider SSE3, for example. Why should the same concept not apply to > SSE3 instructions as to CMOV? > Because then user programs would run 20x or more slower than the user expects. Better to terminate early (and teach userspace how to choose the instruction subset correctly). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on 10 Nov 2009 15:10 On 11/10/2009 11:50 AM, Avi Kivity wrote: >> >> Consider SSE3, for example. Why should the same concept not apply to >> SSE3 instructions as to CMOV? > > Because then user programs would run 20x or more slower than the user > expects. Better to terminate early (and teach userspace how to choose > the instruction subset correctly). > I picked the example carefully: SSE3 is a small set of instructions which probably aren't used very heavily. In that sense, it has *exactly* the same properties as CMOV - if you have the source, you're better off recompiling, but it *might* help you if you happen to only have a binary. What I want people to understand is that this is a *huge* rathole, and it doesn't have any obvious bottom that I can see. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Willy Tarreau on 10 Nov 2009 15:20 On Tue, Nov 10, 2009 at 12:01:47PM -0800, H. Peter Anvin wrote: > On 11/10/2009 11:50 AM, Avi Kivity wrote: > >> > >> Consider SSE3, for example. Why should the same concept not apply to > >> SSE3 instructions as to CMOV? > > > > Because then user programs would run 20x or more slower than the user > > expects. Better to terminate early (and teach userspace how to choose > > the instruction subset correctly). > > > > I picked the example carefully: SSE3 is a small set of instructions > which probably aren't used very heavily. In that sense, it has > *exactly* the same properties as CMOV - if you have the source, you're > better off recompiling, but it *might* help you if you happen to only > have a binary. > > What I want people to understand is that this is a *huge* rathole, and > it doesn't have any obvious bottom that I can see. Indeed, but there is a difference between [cmpxchg, bswap, cmov, nopl] on one side and [sse*] on the other : distros are built assuming the former are always available while they are not always. And the distro which make the difference have to provide an dedicated build for earlier systems just for compatibility. SSE*, 3dnow* etc... are only used by a handful of media players/converters/encoders which are able to detect themselves what to use and already have the necessary fallbacks because these instruction sets vary too much between processors and vendors. One could argue that cmpxchg/bswap/xadd are supported by 486 and that implementing them for 386 is almost useless now (though it costs almost nothing to provide them, I did a few years ago). CMOV/NOPL are rarely used, thus have no reason to cause a massive performance drop, but are frequent enough (at least cmov) for almost any program to have at least one or two inside, making it incompatible with a given processor, and are almost obvious to implement too. SSE*/3dnow* would be much much harder and would only serve very few programs, and serve them badly because when they're used, it would be intensive. I personally am not against being able to emulate every optional instruction, quite the opposite instead. It's just that if in order to do this, we add cost to the other obvious ones, we lose what we expected to win (simplicity and efficiency). Regards, Willy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Willy Tarreau on 10 Nov 2009 15:40
On Tue, Nov 10, 2009 at 12:25:02PM -0800, H. Peter Anvin wrote: > On 11/10/2009 12:16 PM, Willy Tarreau wrote: > > > > Indeed, but there is a difference between [cmpxchg, bswap, cmov, nopl] > > on one side and [sse*] on the other : distros are built assuming the > > former are always available while they are not always. And the distro > > which make the difference have to provide an dedicated build for earlier > > systems just for compatibility. SSE*, 3dnow* etc... are only used by a > > handful of media players/converters/encoders which are able to detect > > themselves what to use and already have the necessary fallbacks because > > these instruction sets vary too much between processors and vendors. > > > > That is increasingly not true since gcc is now doing autovectorization. But programs have to be built to use that specific platform anyway ; this is different from all programs built with support for CMOV enabled by default and which will work on 95% of the platforms. (...) > I could 970 cmovs in libc out of 322660 instructions. That is one in > 333 instruction. Not bad, I agree ! But on the C3, CMOV from/to register is implemented. It's only CMOV from/to memory which has to be emulated, which makes it a lot less common. Anyway that's why we need counters, so that the user knows when he really ought to recompile. (...) > I don't see any particular subset as being more obvious than the other, > with the *possible* exception of NOPL, simply because NOPL was > undocumented for so long. well, simply the availability of binaries making use of them. I'm not sure you would find SSE* instructions in your libc where you found the 970 cmov. For NOPL, that's different, I first heard about it in this thread, and my C3 running with the CMOV patch has never complained from missing it :-) Regards, Willy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |