Prev: Which is the most beautiful and memorable hardware structure in a ?CPU?
Next: Energy usage per application ?
From: =?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?= on 18 Apr 2010 07:57 Andy "Krazy" Glew <ag-news(a)patten-glew.net> wrote: > The ARM Cortex A9 CPU is out-of-order, and is becoming more and more > widely used in things like cell phones and iPads. Cortex A9 is not shipping in any product yet (I believe). Lots of preannouncements though. The Apple A4 CPU is currently believed to be a tweaked Cortex A8, perhaps related to the tweaked A8 that Intrinsity did for Samsung before being acquired by Apple. Someone with a jailbroken iPad (or having paid the 99$ fee) could run benchmarks to probe the properties of the CPU. -- Mvh./Regards, Niels J�rgen Kruse, Vanl�se, Denmark
From: Robert Myers on 18 Apr 2010 09:12 On Apr 17, 11:07 pm, "Andy \"Krazy\" Glew" <ag-n...(a)patten-glew.net> wrote: > I suspect that we will end up in a bifurcated market: out-of-order for the high performance general purpose computation > in cell phones and other important portable computers, in-order in the SIMD/SIMT/CoherentThreading GPU style > microarchitectures. > > The annoying thing about such bifurcation is that it leads to hybrid heterogenous architectures - and you never know how > much to invest in either half. Whatever resource allocation you make to in-order SIMD vs. ooo scalar will be wrong for > some workloads. > A significant part of the resources of the Cray 1 were nearly useless to almost no matter what customer bought and/or used such machines. Machines were bought by customers who had no use for vector registers and by customers for whom there was a whole class of scalar registers that were nearly beside the point. However difficult those choices may have been (and I'm not sure they weren't less important than the cost and cooling requirements of the memory), the machines were built and people bought and used them. I don't think the choices are nearly as hard now. Transistors are nearly free, but active transistors consume watts, which aren't free. There are design costs to absorb, but you'd rather spread those costs over as many chips as possible, even if it means that most customers have chips with capabilities they never use. So long as the useless capabilities are idle and consume no watts, everyone is happy. > I think that the most interesting thing going forward will be microarchitectures that are hybrids, but which are > homogenous: where ooo code can run reasonably efficiently on a microarchitecture that can run GPU-style threaded SIMD / > Coherent threading as well. Or vice versa. Minimizng the amount of hardware that can only be used for one class of > computation. I thought that was one of the goals of pushing scheduling out to the compiler. I still don't know whether the goal was never possible or Itanium was just a hopelessly clumsy design. Robert.
From: jgd on 18 Apr 2010 09:45 In article <7ca97a3f-11a0-47d4-b9fa-181024a9e9c8(a)z3g2000yqz.googlegroups.com>, rbmyersusa(a)gmail.com (Robert Myers) wrote: > I thought that was one of the goals of pushing scheduling out to the > compiler. I still don't know whether the goal was never possible or > Itanium was just a hopelessly clumsy design. It seems to have been possible for a limited class of problems: ones where you could use profile-guided optimisation on a relatively small amount of critical code that consumed almost all the CPU time, with example data that truly represented (almost) all of the data that was likely to be put through that critical code, and which used a fairly small selection of the possible code paths through the critical code. This comes down, in practice to "code that's much like the benchmarks that were studied before designing the architecture". Which is rather different from all the code that people want to run on high-performance computers. So while pushing scheduling out the compiler was *possible*, it doesn't seem to have been *practical*. I still bear many scars from the Itanium, and it's had the effect of making many of the ideas used in unattractive for years to come. -- John Dallman, jgd(a)cix.co.uk, HTML mail is treated as probable spam.
From: nmm1 on 18 Apr 2010 10:40 In article <2dOdnQbD_v6TkFbWnZ2dnUVZ8lCdnZ2d(a)giganews.com>, <jgd(a)cix.compulink.co.uk> wrote: >In article ><7ca97a3f-11a0-47d4-b9fa-181024a9e9c8(a)z3g2000yqz.googlegroups.com>, >rbmyersusa(a)gmail.com (Robert Myers) wrote: > >> I thought that was one of the goals of pushing scheduling out to the >> compiler. I still don't know whether the goal was never possible or >> Itanium was just a hopelessly clumsy design. > >It seems to have been possible for a limited class of problems: ones >where you could use profile-guided optimisation on a relatively small >amount of critical code that consumed almost all the CPU time, with >example data that truly represented (almost) all of the data that was >likely to be put through that critical code, and which used a fairly >small selection of the possible code paths through the critical code. Precisely. Exactly as every expert expected. What seems to have happened is that a few commercial compscis[*] demonstrated that working on some carefully selected programs, and persuaded the decision makers that they could deliver it on most of the important, performance critical, codes. The fact that it was known to be infeasible, and had been for 25 years, was ignored. I have no idea which people were responsible for that, though I have heard that anyone who queried the party line was howled down and moved to other work. But that's hearsay. I said that the project would fail, and why, in detail, in 1995/6. One of the two aspects they partially fixed up (the interrupt one), at great difficulty and by dropping one of the most important performance features. The other was a spectacular failure, for precisely the reasons I gave. And I have never claimed to be a world expert - all I was using was common knowledge to people who had worked in or with those areas. [*] NOT a flattering term. Regards, Nick Maclaren.
From: Robert Myers on 18 Apr 2010 11:03
On Apr 18, 10:40 am, n...(a)cam.ac.uk wrote: > In article <2dOdnQbD_v6TkFbWnZ2dnUVZ8lCdn...(a)giganews.com>, > > <j...(a)cix.compulink.co.uk> wrote: > >In article > ><7ca97a3f-11a0-47d4-b9fa-181024a9e...(a)z3g2000yqz.googlegroups.com>, > >rbmyers...(a)gmail.com (Robert Myers) wrote: > > >> I thought that was one of the goals of pushing scheduling out to the > >> compiler. I still don't know whether the goal was never possible or > >> Itanium was just a hopelessly clumsy design. > > >It seems to have been possible for a limited class of problems: ones > >where you could use profile-guided optimisation on a relatively small > >amount of critical code that consumed almost all the CPU time, with > >example data that truly represented (almost) all of the data that was > >likely to be put through that critical code, and which used a fairly > >small selection of the possible code paths through the critical code. > > Precisely. Exactly as every expert expected. > > What seems to have happened is that a few commercial compscis[*] > demonstrated that working on some carefully selected programs, and > persuaded the decision makers that they could deliver it on most of > the important, performance critical, codes. The fact that it was > known to be infeasible, and had been for 25 years, was ignored. > I have no idea which people were responsible for that, though I > have heard that anyone who queried the party line was howled down > and moved to other work. But that's hearsay. > > I said that the project would fail, and why, in detail, in 1995/6. > One of the two aspects they partially fixed up (the interrupt one), > at great difficulty and by dropping one of the most important > performance features. The other was a spectacular failure, for > precisely the reasons I gave. And I have never claimed to be a > world expert - all I was using was common knowledge to people who > had worked in or with those areas. > > [*] NOT a flattering term. > One of the more interesting posts you made on this subject was the amount of state that IA-64 carried and the complexity of the rules required to operate on that state. That all that cruft would lead to all kinds of problems seems hardly surprising, but it also seems hardly intrinsic to VLIW and/or putting more of the burden of scheduling on the compiler. My assumption, backed by no evidence, is that HP/Intel kept adding "features" to get the architecture to perform as they had hoped until the architecture was sunk by its own features. You think the problem is fundamental. I think the problem is fundamental only because of the way that code is written, in a language that leaves the compiler to do too much guessing for the idea to have even a hope of working at all. The early work from IBM *didn't* just look at computation-heavy, very repetitive HPC-like codes. It examined implausible things like word processors and found a tremendous amount of predictability in behavior such as computation paths. Maybe most of that predictability has now been successfully absorbed by run-time branch predictors, making the possible gains in trying to it exploit it at the compile stage moot. Since the world *does* write in languages that defy optimization, and most of the work on languages does not seem interested in how optimizable a language is, the net conclusion is the same: the idea will never work, but not for the almost-mathematical reasons you claim. Robert. |