Prev: VMWare tools killed my Mac OS X ?
Next: Software vs hardware floating-point [was Re: What happened ...]
From: Tim McCaffrey on 1 Oct 2009 19:32 In article <ha2hmu$equ$1(a)smaug.linux.pwf.cam.ac.uk>, nmm1(a)cam.ac.uk says... > >In article <ha2fqp$o3s$1(a)USTR-NEWS.TR.UNISYS.COM>, >Tim McCaffrey <timcaffrey(a)aol.com> wrote: >>> >>>Nope. You are thinking at FAR too low a level! >> >>The problem is that without understanding the lower level there is an annoying >>tendency on the part of most programmers (except people on this newsgroup, of >>course :) ), to view anything that fits on a single line of C code to have an >>execution time of one cycle. Including any function calls to functions they >>didn't write. >> >>The other annoying thing is that with OOO processors, large L1 caches, and >>multiple processors on a chip, sometimes they are right. > >That's STILL talking at FAR too low a level! > Well, it is and it isn't (IMNSHO). Any really new architecture should account for things like latency of memory and I/O devices. For instance, the PCI bus was architected in the days when a 100ns latency from an I/O device was well within an order of magnitude of the processors cycle time, so to do PCI bus reads to get the device status was OK. With today's technology it would FAR better if talking to a PCI I/O device was strictly a push model, with advance cache coherency/update enabled on the processors' side (I think Core i7 has this now). But too many hardware designers still assume a PCI bus read is No Big Deal, and force the device driver writer to do multiple reads across the bus to get anything done. To change this requires, of course, OS support (MSI/MSI-X interrupt support). And, as you would say, that is only one example where low level considerations have a big impact on the architecture's performance. I've seen too many designs where the designer(s) assume that if it is done in hardware it is instaneous. I mean, pick apart all the things IA does wrong/badly/inefficiently and figure out a better way or a way to do without it. You probably will not use the result, but it can show you some mistakes to avoid. - Tim
From: Robert Myers on 2 Oct 2009 00:04 On Oct 1, 9:39 pm, Andrew Reilly <andrew-newsp...(a)areilly.bpc- users.org> wrote: > > Fair enough. (Nice paraphrase, btw!) Thank you. > I suspect that our difference of > opinion comes from the "level" that one might like to be doing the > experimentation/tuning. You seem to be arguing that we'll only make > forward progress if we use languages/tools that expose the exact hardware > semantics so that we can arrange our applications to suit. That may very > well be the right answer, but it's not one that I like the sound of. I > would vastly prefer to be able to describe application parallelism in > something approaching a formalism, sufficiently abstract that it will be > able to both withstand generations of hardware change and be amenable to > system tuning. Quite a bit of that sort of tuning is likely to be better > in a VM or dynamic compilation environment because there's some scope for > tuning strategies and even (locking) algorithms at run-time. We are in violent agreement. Nothing in the field ever seems to happen that way. If there were a plausible formalism that looked like it would stick, I think it would make a big difference, but that's the kind of bias I have that Nick snickers at. Short of that, I'd prefer that the tinkering be done before any metal is cut. Fat chance, I think. Robert.
From: "Andy "Krazy" Glew" on 2 Oct 2009 00:56 Robert Myers wrote: > On Sep 30, 10:22 am, "Andy \"Krazy\" Glew" <ag-n...(a)patten-glew.net> > wrote: > >> This is the ISA designer's equivalent of "First, do no harm." > > Some of these conversations really confuse me. > > In your "hardbound/softbound" thread, it was concluded (I thought) > that giving up a factor of two in execution speed was no big deal. Some of the responders to that thread said they would be happy to give up 2x performance to get a near-guarantee of no binary code injection via buffer overflow security holes. Others said they wouldn't. Some said they had no bugs. And Nick and Wilco said that it could not be done, or was of no value, or... whatever. Myself, I refer people to the HardBound and SoftBound papers, that quotes a much lower than 2x performance cost. More like 1.15x. I think that is a reasonable tradeoff. > How much do you have to give up to present almost any ISA you want by > way of a virtual machine? To present an ISA that is doing the same work - with good binary translation or cross compilation, you give up little. But, we aren't talking about doing the same work. We are talking about doing different work. E.g. an ISA feature that does something like creating cache snoopers for N memory addresses. E.g. transactional memory. Anything like that, that involves snooping, is considerably more hardware. E.g. doing AES encryption on every cache line going to main memory. It may look like very little extra cost - for the snoopers, because they are "in parallel"; for memory encryption, because big L3 caches reduce frequency of the operation, and because OOO tolerates the extra latency. But for "the simplest possible implementation" the costs are much higher. Yes, HardBound comes right up to the edge of violating this "First, do no harm" rule. HardBound is especially attractive on an OOO machine, because an extra latencies involved tend to get hidden. HardBound is less attractive on the simplest possible implementation, in-order, non-speculative. Less opportunity to latency hide. But, observe: HardBound is identical to doing the same work in software in the simplest possible implementation: in-order, non-pipelined, microcoded. For an unlimited width OOO superscalar, HardBound is again identical to software. But for a finite width OOO superscalar, HardBound is faster than software, since it requires less instruction bandwidth. And for a simple (but not simplest) implementation - in-order, 1-wide - HardBound again beats software. Thus, HardBound does no harm compared to software doing the same work, at the endpoints of the design space. And wins in the middle. And HardBound does almost no harm in the middle of the design space, compared to software that does LESS work, because it has no checking. That about as close as you come to a win in the "First, do no harm" contest.
From: Robert Myers on 2 Oct 2009 01:37 On Oct 2, 12:56 am, "Andy \"Krazy\" Glew" <ag-n...(a)patten-glew.net> wrote: > > That about as close as you come to a win in the "First, do no harm" contest. A virtual machine doesn't have to be stupid, though. Or, rather, the user of the virtual machine doesn't have to be stupid. All the unnecessary software cruft can be pushed off the stage like movable scenery when it's not helpful or at least when it's extremely harmful. That's one nice thing about virtual machines. They can mimic reconfigurable hardware. It sounds like the proposals are too expensive no matter how they are implemented, hard or soft. Robert.
From: nmm1 on 2 Oct 2009 03:09
In article <4AC587E5.2020600(a)patten-glew.net>, Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote: >Robert Myers wrote: >> >>> This is the ISA designer's equivalent of "First, do no harm." >> >> Some of these conversations really confuse me. >> >> In your "hardbound/softbound" thread, it was concluded (I thought) >> that giving up a factor of two in execution speed was no big deal. > >Some of the responders to that thread said they would be happy to give >up 2x performance to get a near-guarantee of no binary code injection >via buffer overflow security holes. That is true. >And Nick and Wilco said that it could not be done, or was of no value, >or... whatever. That is not true. Yes, it can be done, and I said I had seen it done (for a slightly higher factor, but with no hardware assistance). What we were saying was that it can't be done in C, because there is no consistent specification of C. Regards, Nick Maclaren. |