Prev: A post to comp.risks that everyone on comp.arch should read
Next: Call for papers : HPCS-10, USA, July 2010
From: nedbrek on 13 May 2010 20:44 Hello all, "Andy 'Krazy' Glew" <ag-news(a)patten-glew.net> wrote in message news:4BE72955.9000809(a)patten-glew.net... > On 1/25/2010 3:43 AM, nedbrek wrote: >> Hello all, >> >> "Andy "Krazy" Glew"<ag-news(a)patten-glew.net> wrote in message >> news:4B5C999C.9060301(a)patten-glew.net... >>> nedbrek wrote: >>>> That's where my mind starts to boggle. I would need see branch >>>> predictor >>>> and serialization data showing a window this big would deliver >>>> significant performance gains. We were looking at Itanium runs of >>>> Spec2k >>>> built by Electron (i.e. super optimized). We were assuming very heavy >>>> implementation (few serializing conditions). We were unable to scale >>>> this >>>> far. > > By the way, Ned's comment about needing to see branch prediction data > indicates > a fundamental misunderstanding of speculative multithreadng. IIRC, my comments were in reference to a generic, big, OOO. I am heavily influenced by my history (aren't we all)... When we were looking at Itanium, we were attempting to sell _any_ OOO machine to a group solidly opposed to OOO. We had to find a minimal set of complexity to justify the performance and design complexity. Layering on unknown/unproven techinques would have been suicide (or, at least, more suicidal than the idea was already). We hadn't seen SpMT, per se, but there was a similar idea published we had read (~2000-2003, I forget the name, big OOO running heavily speculative, possibly wrong, checked by a wide IO). When we switched to x86, we had similar problems for different reasons. The x86 guys were (and probably still are) risk averse. They have a set beat pattern to hit, and cannot afford to miss. Minor, incremental changes were possible - but even those are hard to sell. An idea had to sell itself, by itself - in this case, the large window. Layering in more complexity was impossible. This creates an interesting argument: What killed uarch research/development? 1) Pentium 4 2) Itanium 3) Collapse of the ~2000 Internet bubble 4) No killer apps to use up perf 5) Other? (crazy conspiracy theories can go here) Ned
From: Robert Myers on 13 May 2010 21:13 On May 13, 8:44 pm, "nedbrek" <nedb...(a)yahoo.com> wrote: > Hello all, > > "Andy 'Krazy' Glew" <ag-n...(a)patten-glew.net> wrote in messagenews:4BE72955.9000809(a)patten-glew.net... > > > > > > > On 1/25/2010 3:43 AM, nedbrek wrote: > >> Hello all, > > >> "Andy "Krazy" Glew"<ag-n...(a)patten-glew.net> wrote in message > >>news:4B5C999C.9060301(a)patten-glew.net... > >>> nedbrek wrote: > >>>> That's where my mind starts to boggle. I would need see branch > >>>> predictor > >>>> and serialization data showing a window this big would deliver > >>>> significant performance gains. We were looking at Itanium runs of > >>>> Spec2k > >>>> built by Electron (i.e. super optimized). We were assuming very heavy > >>>> implementation (few serializing conditions). We were unable to scale > >>>> this > >>>> far. > > > By the way, Ned's comment about needing to see branch prediction data > > indicates > > a fundamental misunderstanding of speculative multithreadng. > > IIRC, my comments were in reference to a generic, big, OOO. I am heavily > influenced by my history (aren't we all)... > > When we were looking at Itanium, we were attempting to sell _any_ OOO > machine to a group solidly opposed to OOO. We had to find a minimal set of > complexity to justify the performance and design complexity. Layering on > unknown/unproven techinques would have been suicide (or, at least, more > suicidal than the idea was already). We hadn't seen SpMT, per se, but there > was a similar idea published we had read (~2000-2003, I forget the name, big > OOO running heavily speculative, possibly wrong, checked by a wide IO). > > When we switched to x86, we had similar problems for different reasons. The > x86 guys were (and probably still are) risk averse. They have a set beat > pattern to hit, and cannot afford to miss. Minor, incremental changes were > possible - but even those are hard to sell. An idea had to sell itself, by > itself - in this case, the large window. Layering in more complexity was > impossible. > > This creates an interesting argument: > What killed uarch research/development? > 1) Pentium 4 > 2) Itanium > 3) Collapse of the ~2000 Internet bubble > 4) No killer apps to use up perf > 5) Other? (crazy conspiracy theories can go here) > 0) Power constraints. Both Pentium 4 and Itanium must have contributed mightily to Intel's risk-aversion in that department. Having smart phones and ARM nipping at Intel's one-trick-pony can't be helping, either. There are no more transistors and/or watts to throw at anything. Robert.
From: MitchAlsup on 14 May 2010 00:11 On May 13, 7:44 pm, "nedbrek" <nedb...(a)yahoo.com> wrote: > When we switched to x86, we had similar problems for different reasons. The > x86 guys were (and probably still are) risk averse. They have a set beat > pattern to hit, and cannot afford to miss. Minor, incremental changes were > possible - but even those are hard to sell. {A pause is necessary here, just to catch my breath.} Excepting for the in for architectural misstep down the P4 direction and then a retreat back to Pentium Pro microarchitecture, has there been anything other than architectural refinement? More cache, new buss/interconnect, more prediction, better decoding, tweek the memory and I/Os; and yet the basic infrastructiure of PP survives to this day. This evolution was "hard to sell"? even considering the 50M/100M per year rates of selling them? > This creates an interesting argument: > What killed uarch research/development? > 1) Pentium 4 > 2) Itanium > 3) Collapse of the ~2000 Internet bubble > 4) No killer apps to use up perf > 5) Other? (crazy conspiracy theories can go here) Other: We have exploited all the real architecture invented in 1959 (Stretch), 1962 (6600), 1965 (360/91), and 1967 (360/85) to their natural evolutionary optimal implementations (i.e. dead ends). To this we invented branch prediction (although vestiments existed as early as 1967-8 (7600)), and a myriad of bells and whistles to nickle and dime ourselves to were we are to day. In my opinion, the way forward in the big-computer realm is threads, yet one cannot exploit threads with current languages (memory models in particular), our current synchronization means (and the memory traffic it entails), and perhaps some departure from the vonNeumann model itself (only one thing is happening at once on a per thread basis). In my opinion, the way forward in the low-power realm is also threads. Here the great big OoO machine microarchitectures burn more power than deliver performance. Yet evolving backwards down from the BG OoO machines is not possible while benchmarks remains monothreaded even though smaller simpler CPUs deliver more power per watt and more power per unit die area. Yet, one does not have to evolve back "all that far" to achieve a much better balance between performance and performance/watt. However, I have found this a hard sell. None of the problems mentioned above get any easier, in fact they become more acute as you end up threading more things. Thus, I conclude that: 6) running out of space to evolved killed of microarchitectural inovation. {And with the general caveat that no company actually does architectural or microarchitectural research, each does development based on short-medium term goals. Research happens en-the-large as various companies show their wares and various competitors attempt to incorporate or advance their adversary's developments. Much like bological evolution.} Mitch
From: Andy 'Krazy' Glew on 14 May 2010 02:49 On 5/13/2010 5:44 PM, nedbrek wrote: > "Andy 'Krazy' Glew"<ag-news(a)patten-glew.net> wrote in message news:4BE72955.9000809(a)patten-glew.net... >> On 1/25/2010 3:43 AM, nedbrek wrote: >> By the way, Ned's comment about needing to see branch prediction data >> indicates a fundamental misunderstanding of speculative multithreadng. > > IIRC, my comments were in reference to a generic, big, OOO. I am heavily > influenced by my history (aren't we all)... Ah. My reasoning in the 1990s had run something like: * to get more performance from single threaded programs we need to increase the number of instructions in flight, and hence the instruction window * branch mispredictions and other serializations limit the number of instructions that a single sequencer, a single stream of instructions can supply * therefore, to take advantage of a large instruction window for a logically single threaded program, one must supply instructions from multiple points in that program, multiple sequencers. => multiple threads within the logical single threaded program. Either SpMT, or some other way ofexpliting control independence. SpMT is rather coarse grained; I suspect that the nxt step after SpMT would be something like static dataflow. QED This argument doesn't say when diminishing returns hits the OOO window. I like the kilo-instruction window research. But, eventually, it will hit. > When we were looking at Itanium, we were attempting to sell _any_ OOO > machine to a group solidly opposed to OOO. I'm getting historical context. I attempted to sell OOO, small window, big window, and then SpMT to Itanium circa 1998. The original Tejas. But the OOO got sidetracked, and while they were interested in SpMT, they wanted SpMT as an alternative to OOO. And, much as I like SpMT, OOO is a more proven technology. They also liked run-ahead. You tried again circa 2003? > When we switched to x86, we had similar problems for different reasons. The > x86 guys were (and probably still are) risk averse. They have a set beat > pattern to hit, and cannot afford to miss. Minor, incremental changes were > possible - but even those are hard to sell. An idea had to sell itself, by > itself - in this case, the large window. Layering in more complexity was > impossible. > > This creates an interesting argument: > What killed uarch research/development? > 1) Pentium 4 > 2) Itanium > 3) Collapse of the ~2000 Internet bubble > 4) No killer apps to use up perf > 5) Other? (crazy conspiracy theories can go here) 6) Collapse of all effective competition to Intel and x86. Without other companies doing different things, Intel has little incentive to innovate. 7) Cost of fabs. High cost => risk aversion. Although overall I see two major wrong turns (Pentium 4 and Itanium), coupled to a lack of demand (no killer apps), leading to a situation where the VLSI got dense enough for multicore, and multicore will absorb all mindshare for a decade or so. Plus the power issues. Which were exacerbated by Pentium 4's high frequency approach.
From: nedbrek on 14 May 2010 08:05
Hello all, "Andy 'Krazy' Glew" <ag-news(a)patten-glew.net> wrote in message news:4BECF264.8060401(a)patten-glew.net... > On 5/13/2010 5:44 PM, nedbrek wrote: >> "Andy 'Krazy' Glew"<ag-news(a)patten-glew.net> wrote in message >> news:4BE72955.9000809(a)patten-glew.net... >>> On 1/25/2010 3:43 AM, nedbrek wrote: > >>> By the way, Ned's comment about needing to see branch prediction data >>> indicates a fundamental misunderstanding of speculative multithreadng. >> >> IIRC, my comments were in reference to a generic, big, OOO. I am heavily >> influenced by my history (aren't we all)... > > Ah. My reasoning in the 1990s had run something like: > > * therefore, to take advantage of a large instruction window for a > logically > single threaded program, one must supply instructions from > multiple points in that program, multiple sequencers. => multiple threads > within the logical single threaded program. > > Either SpMT, or some other way ofexpliting control independence. SpMT is > rather coarse grained; I suspect that the next step after > SpMT would be something like static dataflow. I grew up under Yale Patt, with "10 IPC on gcc". A lot of people thought it was possible, without multiple IPs. >> When we were looking at Itanium, we were attempting to sell _any_ OOO >> machine to a group solidly opposed to OOO. > > I'm getting historical context. > > I attempted to sell OOO, small window, big window, and then SpMT to > Itanium > circa 1998. The original Tejas. But the OOO got sidetracked, > and while they were interested in SpMT, they wanted SpMT as an alternative > to > OOO. And, much as I like SpMT, OOO is a more proven technology. > They also liked run-ahead. > > You tried again circa 2003? Yes, I will need to draw up the exact timeline sometime. I started in MRL in Jan 01, as part of a group (of two, counting me!) to develop a new Itanium strawman. We had a blank check, whatever it takes to make Itanium the performance leader. IIRC, by 2003 things were actually starting to wind down, as we were coming to realize that nothing would ever be done. But yea, 2002-2003. >> This creates an interesting argument: >> What killed uarch research/development? >> 1) Pentium 4 >> 2) Itanium >> 3) Collapse of the ~2000 Internet bubble >> 4) No killer apps to use up perf >> 5) Other? (crazy conspiracy theories can go here) > > 6) Collapse of all effective competition to Intel and x86. Without other > companies doing different things, Intel has little incentive to innovate. Intel's biggest competitor is itself (competition among teams is probably too aggressive). I would phrase this rather as, "Intel could allow the more (risky) innovative ideas to get tabled, in favor of less risky alternatives." Performance is going up, just in a more evolutionary, rather than revolutionary manner. > 7) Cost of fabs. High cost => risk aversion. Definitely. That doesn't mean a small team can't be set aside to do something revolutionary. The 80 core thing was this sort of idea, only done terribly wrong. > Although overall I see two major wrong turns (Pentium 4 and Itanium), > coupled > to a lack of demand (no killer apps), leading to a situation where the > VLSI got > dense enough for multicore, and multicore will absorb all mindshare for a > decade or so. > > Plus the power issues. Which were exacerbated by Pentium 4's high > frequency > approach. Yes, multicore is the new bandwagon. P4 pushed the frequency pendulum too far, and now we've overreacted. The ironic thing, (which we demonstrated, and which made us hugely unpopular) is that massive many-core burns just as much (or more) power than a smart OOO on anything but grossly parallel applications. Ned |