From: Robert Myers on 24 Oct 2009 19:22 On Oct 24, 6:31 pm, Bernd Paysan <bernd.pay...(a)gmx.de> wrote: > One interesting property of quantum mechanics is that for irreversible > logic, there's a minimum amount of energy that is necessary to make it > happen. Reversible logic does not have this drawback. Therefore, > people investigate into reversible logic, even though the actual > components to get that benefit are not in sigh (not even carbon nanotube > switches have these properties, even though they are much closer to the > physical limits for irreversible logic). Many people also forget that > quantum mechanics does not properly take changes in the system into > account, and that means that your reversible logic only works with the > predicted low power when the inputs are not changing any more - and this > is just the uninteresting case (the coherent one - changes in the system > lead to decoherence, and thereby to classical physics). Let's see. Quantum mechanics properly applied takes account of everything in the whole universe, which is, so far as I know, quantum mechanical and reversible in it's entirety. If you could isolate parts of the system, like your computing apparatus, then it would be like a universe that is quantum mechanical and reversible in its entirety. Such a device would have little use to us, because we could neither give it new problems to work on nor read the results when it's done. In order to give the device a new problem, we must disturb it, but the system can still retain enough coherence to function as a quantum mechanical device. Only the entropy involved in the process of giving the device input and reading the output has an irreducible cost in energy that we must put on to the electric bill, as we will never get it back, except as waste heat. Thus, even though you can't do operations with *no* net cost in energy, we can still build and operate devices that act as quantum mechanical computers to an arbitrarily good approximation. Writing to them and reading from them is always an irreversible process that, if repeated often enough, will eventually lead to the device having no useful quantum mechanical coherence left for us to exploit, as we have destroyed it all through our reading and writing. In the interim, we can do an awful lot of computation. Otherwise, "quantum computers" would not be possible. I'm having a hard time reconciling how I understand the problem with what you just said, which seems too sweeping and too black and white. Can you help me out? Robert.
From: "Andy "Krazy" Glew" on 24 Oct 2009 21:17 nmm1(a)cam.ac.uk wrote: > In article <4AE12FA9.1000706(a)patten-glew.net>, > Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote: >> Robert Myers wrote: >> >> I am not aware of an Itanium shipped or proposed that had an "x86 core >> on the side". > > I am. I can't say how far the proposal got - it may never have got > beyond the back of the envelope stage, and then got flattened as a > project by management. I am reasonably certain that you are misremembering or misunderstanding a presentation that may have oversimplified things.
From: Andrew Reilly on 24 Oct 2009 21:40 On Sat, 24 Oct 2009 12:25:40 -0700, Robert Myers wrote: > I don't have any insight into what being architecture-naive on the other > architectures might be, but, for Itanium, you have to start with deep > insight into the code in order to get a payback on all the fancy bells > and whistles. Itanium should be getting more instructions per clock, > not significantly fewer (that *was* the idea, wasn't it?). I've not used an Itanium, but it would seem to have quite a bit of practical similarity to the Texas Instruments TIC6000 series of VLIW DSP processors, in that it is essentially in-order VLIW with predicated instructions and some instruction encoding funkiness. That whole idea is *predicated* on being able to software-pipeline loop bodies and do enough iterations to make them a worthwhile fraction of your execution time. From memory, Anton's TeX benchmark is the exact opposite: strictly integer code of the twistiest non-loopy conditional nature. I would not expect even a heroic compiler to get *any* significant parallel issues going, at which point it falls back to being an in-order RISC-like machine: not dramatically unlike a pre-Cortex ARM, or SPARC, as you said. Now, Texas' compilers for the C6000 *are* heroic, and I've seen them regularly schedule all eight possible instruction slots active per cycle, for appropriate DSP code. The interesting thing is that this process is *extremely* fragile. If the loop body contains too many instructions (for whatever reason), or some other limitation, then the compiler seems to throw up its hands and give you essentially single-instruction-per- cycle code, which is (comparatively) hopeless. Smells like a box full of heuristics, rather than reliable proof. The only way to proceed is to hack the source code into little pieces and try variations until the compiler behaves "well" again. At least the TI parts *do* get low power consumption out of the deal, and since they clock more slowly they don't have quite so many cycles to wait for a cache miss. And no-one is trying to run TeX on them... Cheers, -- Andrew
From: Robert Myers on 24 Oct 2009 21:59 On Oct 24, 9:40 pm, Andrew Reilly <andrew-newsp...(a)areilly.bpc- users.org> wrote: > On Sat, 24 Oct 2009 12:25:40 -0700, Robert Myers wrote: > > I don't have any insight into what being architecture-naive on the other > > architectures might be, but, for Itanium, you have to start with deep > > insight into the code in order to get a payback on all the fancy bells > > and whistles. Itanium should be getting more instructions per clock, > > not significantly fewer (that *was* the idea, wasn't it?). > > I've not used an Itanium, but it would seem to have quite a bit of > practical similarity to the Texas Instruments TIC6000 series of VLIW DSP > processors, in that it is essentially in-order VLIW with predicated > instructions and some instruction encoding funkiness. That whole idea is > *predicated* on being able to software-pipeline loop bodies and do enough > iterations to make them a worthwhile fraction of your execution time. > From memory, Anton's TeX benchmark is the exact opposite: strictly > integer code of the twistiest non-loopy conditional nature. I would not > expect even a heroic compiler to get *any* significant parallel issues > going, at which point it falls back to being an in-order RISC-like > machine: not dramatically unlike a pre-Cortex ARM, or SPARC, as you said. > > Now, Texas' compilers for the C6000 *are* heroic, and I've seen them > regularly schedule all eight possible instruction slots active per cycle, > for appropriate DSP code. The interesting thing is that this process is > *extremely* fragile. If the loop body contains too many instructions > (for whatever reason), or some other limitation, then the compiler seems > to throw up its hands and give you essentially single-instruction-per- > cycle code, which is (comparatively) hopeless. Smells like a box full of > heuristics, rather than reliable proof. The only way to proceed is to > hack the source code into little pieces and try variations until the > compiler behaves "well" again. > > At least the TI parts *do* get low power consumption out of the deal, and > since they clock more slowly they don't have quite so many cycles to wait > for a cache miss. And no-one is trying to run TeX on them... > I get so tense here, trying to make sure I don't make a grotesque mistake. Your post made me chuckle. Thanks. I actually didn't even look at the TeX numbers, only the ones I had first relied upon. As a seventh- grade teacher remarked, my laziness might one day be my undoing. Thanks for calling attention to the TI compiler. I've looked at the TI DSP chips, but never gotten further. You know just how heroic a heroic compiler really is. I don't know whether David Dinucci (did I get it right?) is still following. Robert.
From: Robert Myers on 24 Oct 2009 22:32
On Oct 24, 9:59 pm, Robert Myers <rbmyers...(a)gmail.com> wrote: > On Oct 24, 9:40 pm, Andrew Reilly <andrew-newsp...(a)areilly.bpc- > > > > users.org> wrote: > > On Sat, 24 Oct 2009 12:25:40 -0700, Robert Myers wrote: > > > I don't have any insight into what being architecture-naive on the other > > > architectures might be, but, for Itanium, you have to start with deep > > > insight into the code in order to get a payback on all the fancy bells > > > and whistles. Itanium should be getting more instructions per clock, > > > not significantly fewer (that *was* the idea, wasn't it?). > > > I've not used an Itanium, but it would seem to have quite a bit of > > practical similarity to the Texas Instruments TIC6000 series of VLIW DSP > > processors, in that it is essentially in-order VLIW with predicated > > instructions and some instruction encoding funkiness. That whole idea is > > *predicated* on being able to software-pipeline loop bodies and do enough > > iterations to make them a worthwhile fraction of your execution time. > > From memory, Anton's TeX benchmark is the exact opposite: strictly > > integer code of the twistiest non-loopy conditional nature. I would not > > expect even a heroic compiler to get *any* significant parallel issues > > going, at which point it falls back to being an in-order RISC-like > > machine: not dramatically unlike a pre-Cortex ARM, or SPARC, as you said. > > > Now, Texas' compilers for the C6000 *are* heroic, and I've seen them > > regularly schedule all eight possible instruction slots active per cycle, > > for appropriate DSP code. The interesting thing is that this process is > > *extremely* fragile. If the loop body contains too many instructions > > (for whatever reason), or some other limitation, then the compiler seems > > to throw up its hands and give you essentially single-instruction-per- > > cycle code, which is (comparatively) hopeless. Smells like a box full of > > heuristics, rather than reliable proof. The only way to proceed is to > > hack the source code into little pieces and try variations until the > > compiler behaves "well" again. > > > At least the TI parts *do* get low power consumption out of the deal, and > > since they clock more slowly they don't have quite so many cycles to wait > > for a cache miss. And no-one is trying to run TeX on them... > > I get so tense here, trying to make sure I don't make a grotesque > mistake. > > Your post made me chuckle. Thanks. I actually didn't even look at > the TeX numbers, only the ones I had first relied upon. As a seventh- > grade teacher remarked, my laziness might one day be my undoing. > > Thanks for calling attention to the TI compiler. I've looked at the > TI DSP chips, but never gotten further. > > You know just how heroic a heroic compiler really is. I don't know > whether David Dinucci (did I get it right?) is still following. Forgive me for responding to my own post. It was right here, in this forum, that Linus Tovalds, the one, the only, declared the stupidity of software pipelinining because he was, well, you know, used to OoO processors. This is an amazing place. Kudos to Terje who straightened me out. You can find him on David Kanter's forum, if you're still interested. Robert. |