From: "Andy "Krazy" Glew" on 25 Apr 2010 21:54 On 4/23/2010 3:51 PM, MitchAlsup wrote: > On Apr 23, 2:06 am, "Andy \"Krazy\" Glew"<ag-n...(a)patten-glew.net> > wrote: >> While I have worked on, and advocated, handling reg-reg move instructions efficiently, this introduces a whole new level >> of complexity. >> >> Specifically, MOVE elimination, changing >> >> lreg2 := MOVE lreg1 >> lreg3 := ADD lreg2 + 1 > > Ireg2 := MOV Ireg1 > Ireg2 := OP Ireg2,<const or reg or mem> > > Was a hardware optimization in K9 easily detecteed during trace > building. That is/was a local optimization. K9 R.I.P. Generic MOVE elimination works when there is an arbitrary separation between the MOVEr and the USEr. >> lreg2 := MOVE lreg1 ....lots of instructions, including branches and calls >> lreg3 := ADD lreg2 + 1 It also doesn't required a trace cache or a similar structure to hold the optimized code. Although I agree that it makes sense to hold the optmized code, so that you don;t constantly have to re-optimize.
From: MitchAlsup on 25 Apr 2010 22:15 I think there are a number of semi-fundamental issues to be resolved; architects, microarchitects, and thinkers have already run into many of the symptoms, but have not recognized the fundamentals. Much like the period of time after the discovery of the photo-electric effect and the realization of quantum mechanics. This kind of paradigm shift will happen. I just don't know whether the shift will happen in the OS, languages, libraries, communications, or hardware. Probably a little of each. The realization that "one can synchronize" a hundred thousand threads running in a system the size of a basketball court The realization that "there is exactly one notion of time" in a system the size of a basketball court operating in the nano-second range The realization that one cannot* specify "each and every step" in a K*trillion step process and have the compiler recognize the inherent parallelism The realization that one cannot* specify "the large scale data-flow" and simultaneously have each instruction able to take precise interrupts The first two correspond to the Heisenberg uncertanty principle in physics The second two correspond to the difference between effects in the micro-world and effects in the macro-world Perhaps along with the notion of the "Memory Wall" and the "Power Wall" we have (or are about to) run into the "Multi-Processing" Wall. That is, we think we understand the problem of getting applications and their necessary data and disk structures parallel-enough and distributed-enough. And we remain are under the impression that we "espression limited" in applying our techniques to the machines that have been built; but in reality we are limited by something entirely more fundamental, and one we do not yet grasp or cannot yet enumerate. Mitch {"Other than that Mrs. MultiProcessor, how did you like the Play"?}
From: Brett Davis on 26 Apr 2010 01:43 In article <b24c8bb2-fcc3-4f4a-aa0d-0d18601b02eb(a)11g2000yqr.googlegroups.com>, MitchAlsup <MitchAlsup(a)aol.com> wrote: > I think there are a number of semi-fundamental issues to be resolved; > > The realization that "one can synchronize" a hundred thousand threads > running in a system the size of a basketball court > The realization that "there is exactly one notion of time" in a system > the size of a basketball court operating in the nano-second range > The realization that one cannot* specify "each and every step" in a > K*trillion step process and have the compiler recognize the inherent > parallelism > The realization that one cannot* specify "the large scale data-flow" > and simultaneously have each instruction able to take precise > interrupts > > The first two correspond to the Heisenberg uncertanty principle in > physics > The second two correspond to the difference between effects in the > micro-world and effects in the macro-world > > Perhaps along with the notion of the "Memory Wall" and the "Power > Wall" we have (or are about to) run into the "Multi-Processing" Wall. ATI chips already have ~2000 processors, simple scaling over the next decade states that the monitor in your iMac a decade from now will have 100,000 CPUs. Which means that a desktop server will have a million CPUs. One for each 10 pixels on your monitor. A server room with the right software have a higher IQ than you or I do. ;) Brett > That is, we think we understand the problem of getting applications > and their necessary data and disk structures parallel-enough and > distributed-enough. And we remain are under the impression that we > "espression limited" in applying our techniques to the machines that > have been built; but in reality we are limited by something entirely > more fundamental, and one we do not yet grasp or cannot yet enumerate. > > Mitch > > {"Other than that Mrs. MultiProcessor, how did you like the Play"?}
From: nmm1 on 26 Apr 2010 04:35 In article <8PydnQJVJtSKXEnWnZ2dnUVZ8r-dnZ2d(a)giganews.com>, <jgd(a)cix.compulink.co.uk> wrote: > >> >Oh, [the value of compatibility] can be challenged, all right. >> >It's just that the required gains from doing so are steadily >> >increasing as the sunk costs in the current methods grow. >> >> In my experience, that is almost always overstated, and very often >> used as an excuse to avoid thinking out of the box. In particular, >> once software runs on two hardware architectures, porting it to a >> third is usually easy. > >Perfectly true, provided that the architectures are as alike as, say, >x86, MIPS, SPARC and PowerPC are. Which is really quite a lot alike > >Porting to something like Cell (using the SPEs), or MPI clustering, or >something else based on different system-architecture principles is >another matter. Oh, yes, indeed. 100% agreement. However, it is used as an argument to avoid considering (say) the interrupt-free architecture that I have posted on this newsgroup. That would be essentially transparent to 99% of applications, and need mainly reorganisation of the kernel and device drivers (not even a complete rewrite). That is a classic example of an idea that was thought of 40 years ago (probably 50+), but could not have been implemented then, because the technology was inappropriate. But it will not get reconsidered because it is heretical to the great god Compatibility. The fact that it might well deliver a fairly painless factor of two in performance, RAS and reduction of design costs is irrelevant. Similarly, reintroducing a capability design (rather than the half baked hacks that have been put into some Unices) would be far less painful than is often made out, and could easily deliver a massive improvement in RAS - if done properly, perhaps a factor of 10 in the short term, 100 in the medium and thousands in the long. Regards, Nick Maclaren.
From: Terje Mathisen "terje.mathisen at on 26 Apr 2010 08:12
Quadibloc wrote: > On Apr 25, 9:08 am, n...(a)cam.ac.uk wrote: >> though I meet a lot who claim that great god >> Compatibility rules, and must not be challenged. > > Upwards compatibility is my shepherd... > > Even though I walk through the valley of upgrades, > I shall not have to buy all my software over again, > for You are with me. Closed-source vendors don't want this, and they have lots of ways to force you to "upgrade", i.e. keep paying for new licenses. One way is exemplified by the very small, specialized CAD program for orienteering and other maps, OCAD: They used to have a very limited but free version which was sufficient to do course planning for smaller training events. Even though it was free, you still had to register it online, and now that they have taken away the registration robot, I could no longer get it to work after reinstalling the OS on my main PC. Instead my club had to pay about $700 for the cheapest version of the full drawing program. I.e. by making an internet connection compulsory, vendors can do whatever the marketplace will let them get away with. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching" |