Prev: Call for Papers Reminder (extended): The World Congress on Engineering WCE 2010
Next: Call to stop spamming here
From: "Andy "Krazy" Glew" on 21 Mar 2010 15:07 Bernd Paysan wrote: > I think the obvious thing was mentioned in the paper: "make the routers > as simple as possible". This means that the routing information > contains a physical route (a sequence of turns - the most simple router > is a butterfly router with two inputs, two outputs, and one bit of > routing information), and the router just passes on packets as it knows > the next hop from the first beat of the message. On collisions, it has > only few options: Our (terje's) touching on fat-trees has me thinking about a different routing primitive: 2 pairs of bidirectional links, 1 pair "up" (coming from below) and 1 pair "down" (coming from above). Where "bidirectional" doesn't mean a single link that can be turned around, but rather a pair of unidirectional links. The fully populated fat tree property means that the "up" (coming from below) traffic can never be blocked. You have two links coming from below, and two going up. The connectedness property of a fat tree means that you never get into a dead end. You only have to choose which link you want to use. You can get into collisions with downward directed traffic. You can get such collisions (a) between traffic coming from both above links that is directed below, and (b) with local traffic coming from below link #1 that wants to go to below link #2. (c) Or both. You can buffer. You can have fixed prioritization rules: E.g. local traffic (b) wins, requiring that the downward traffic coming from the above links be either buffered or resent. Or the opposite, one of the downward traffic wins. If I assume that buffer-less nodes are cheaper than buffer-full nodes, we want to have a fabric that has many of these buffer-less nodes, with either of the collisions rules above And with a smaller-number of buffer-ful nodes, perhaps only at the root layer of the fat-tree. But probably at multiple levels. Note that prioritizing locality is a nice simple policy. Not the only possible policy, but a simple policy. I'm trying to imagine policis that temporarily prioritize donward traffic coming from above, so that blocked local traffic propagates upwards along the guaranteed to be present fat-tree links until it comes to a buffer layer. I think this has unit gain (<= 1) positive feedback, so that it is stable - you might get into a mode where all of your traffic gets sent to the buffering layer, but you wouldn't get worse and worse, the sort of replay tornadoes that some of us are familiar with. You would require some sort of global scheduling so that downward traffic collisions did not create an unstable system. Because of the possibility of getting stuck in a limit cycle of poor performance (everything going to the buffering layers, e.g. memory with no benefit of locality), you might want to "batch" actions - give local traffic priority while establishing a link for downward traffic. I am sure that this has already been thought of. --- Q: is a 2 input 2 output 2x2 router really the best primitive? Why not something that has 3 sets of links From/To Down Link #1 1 in 1 out From/To Down Link #2 1 in 1 out From/To Up Link 1 in 3 out This has sufficiently uplinks that we can always route in terms of collisions, never buffer or drop/retry. Above I described From/To Down Link #1 1 in 1 out From/To Down Link #2 1 in 1 out From/To Up Link 2 in 2 out although this doesn't have the always routable on collision policy. But From/To Down Link #1 1 in 1 out From/To Down Link #2 1 in 1 out From/To Up Link 2 in 4 out would have the always routable on collision policy.
From: Del Cecchi on 21 Mar 2010 16:42 "Anne & Lynn Wheeler" <lynn(a)garlic.com> wrote in message news:m3mxy1h9wc.fsf(a)garlic.com... > > Del Cecchi` <delcecchi(a)gmail.com> writes: >> The original motivation was to do molecular simulations in the >> bio-tech field, hence the name. Sure, IBM seized on the desire of >> the >> National Labs for prestige and bomb simulation and used it to make >> a >> profit. > > or seized on national labs (& numerical intensive) only as possible > walling off move into commercial at the same time. > > old email > http://www.garlic.com/~lynn/lhwemail.html#medusa > > old post about jan92 moving into commercial also > http://www.garlic.com/~lynn/95.html#13 > > a few weeks before being told it was transferred and couldn't work > on > anything with more than four processors. > > old email, a couple days/hrs ... before the hammer fell > http://www.garlic.com/~lynn/2006x.html#email920129 > > discussing the national lab scenario (I had to skip a LLNL meeting > because of other commitments ... but some of the people at the > meeting dropped by afterwards to bring me up to date). > > then the press item shortly after the hammer fell (17feb92) > http://www.garlic.com/~lynn/2001n.html#6000clusters1 > > and another press item later that summer (we were both gone within a > few > weeks): > http://www.garlic.com/~lynn/2001n.html#6000clusters2 > > the kingston engineering & scientific had been doing molecular > simulation with numerous Floating Point Systems boxes tied to 3090 > with > vector facility. > > In 1980, I had done some HYPERChannel work to allow overflow in the > Santa Teresa lab. (300 people from IMS group) to be moved to offsite > bldg ... but getting local interactive performance using > HYPERChannel as > mainframe channel extension. Then basically did the same > installation > for large IMS field support group in boulder. recent reference > http://www.garlic.com/~lynn/2010f.html#17 > > The person that I worked with for the Boulder installation then > moved to > Kingston to manage the Kingston E&S operation. I worked with him > there > to do high-speed HYPERChannel satellite link between Kingston E&S > lab > and the west coast. This was somewhat totally unrelated to the > operation that was supposedly designing their own numerical > intensive > supercomputer and also providing funding for Steve Chen's effort. > recent post with a little more of the gory details: > http://www.garlic.com/~lynn/2010b.html#71 Happy DEC-10 Day > > The above tended to have some LLNL ties, in part because early > backing > for FCS was standards moving to fiber-optics ... something that LLNL > had > installed in serial-copper form. > > The SCI stuff was with Gustafson out of SLAC. > > Later one of the sparc-10 engineers was at another chip-shop and > designing a fast/inexpensive SCI subset ... and tried to interest me > into taking over the SUN SPRING/DOE operating system effort and > adapting > it to a large distributed SCI infrastructure. This was about the > time > SUN was shutting down the SPRING/DOE effort and transferring > everybody > over to Java group. > I was on the SCI committee, although I sort of came in in the middle. And Rochester had an effort to use SCIL (SCI Like) interface to couple AS400 Boxes in something we called "firmly coupled". The software guys had even signed up. But some guy from POK (Baum?) put the kibosh on it since he didn't believe the Rochester guys could make OS400 NUMA when the Z folks said it would take hundreds of PY. POK always had a NIH complex. But in the end the SCI knock off ended up in Xseries NUMA box. The topology was dual counter rotating rings. del
From: Bernd Paysan on 21 Mar 2010 17:01 MitchAlsup wrote: > The other option is the virtual router where the first beat of > information is the virtual route, and the router takes a beat to > lookup the physical port accociated wth the requests at hand. Adds one > clock of delay from pin to pin, solves a lot of problems. Yes, but works only well when the lookup table is small (IMHO in the order of 8-12 address bits, as virtual routing for an n bit path is O(exp(n))). So you can have virtual routes for sufficiently small subnets, you can route around a lot of problems inside these subnets without propagating them to a global routing table. Virtual routing is a good idea to improve source routing by subdividing the network into islands, which are internally source routed, and on the boundary virtual routing is used to hide details you otherwise have to communicate to too many other hosts. -- Bernd Paysan "If you want it done right, you have to do it yourself" http://www.jwdt.com/~paysan/
From: Anne & Lynn Wheeler on 21 Mar 2010 17:12 "Del Cecchi" <delcecchi(a)gmail.com> writes: > And Rochester had an effort to use SCIL (SCI Like) interface to couple > AS400 Boxes in something we called "firmly coupled". The software > guys had even signed up. But some guy from POK (Baum?) put the kibosh > on it since he didn't believe the Rochester guys could make OS400 NUMA > when the Z folks said it would take hundreds of PY. POK always had a > NIH complex. > > But in the end the SCI knock off ended up in Xseries NUMA box. > > The topology was dual counter rotating rings. re: http://www.garlic.com/~lynn/2010f.html#47 Nonlinear systems and nonlocal supercomputing by the time of SCIL ... we were gone from IBM ... and was only intermediately involved with SCI (couldn't do a whole lot of self-funding on standards committees). long ago and far away, baum was hired into pok to be in charge of (mainframe) tightly-coupled shared-memory multiprocessor architecture .... at the same time my wife was con'ed into moving from the JES group in G'burg to POK to be in charge of (mainframe) loosely-coupled (aka cluster) architecture .... and for a time, both reported to the same manager. mainframe shared-memory for long time required much stronger memory consistency ... that provided in NUMA. during her stint in POK, there was almost exclusive focus on tightly-coupled ... and she didn't stay very long there. Her loosely-coupled architecture (peer-coupled shared-data) saw very little (mainframe) uptake, except for IMS hot-standby ... until sysplex. much later, Steve Chen was CTO at sequent and they were doing NUMA-Q (SCI) and we did some consulting for Steve. later IBM buys sequent. a few recent references: http://www.garlic.com/~lynn/2010e.html#68 Entry point for a Mainframe? http://www.garlic.com/~lynn/2010e.html#70 Entry point for a Mainframe? http://www.garlic.com/~lynn/2010f.html#7 What was the historical price of a P/390? http://www.garlic.com/~lynn/2010f.html#13 What was the historical price of a P/390? there is a similar joke about the internal network. there was somebody from corporate hdqtrs in armonk who had participated in SNA investigation on what would be required to implement a world-wide distributed network ... that came up with enormous amounts of PY ... in part because SNA is so fundamentally opposite to real distributed network. It turns out the majority internal network was done by a single person ... but it used a totally different approach that made world-wide distributed network a relatively trivial result. In anycase, the armonk expert stated that the internal network could not exist because the corporation had never provided funding for such an enormous PY for networking. totally unrelated recent reference to dual counter-rotating rings from long ago and far away: http://www.garlic.com/~lynn/2010e.html#69 search engine history, was Happy DEC-10 Day aka 1mbit/sec LAN being done for replacing copper wiring harness bundles in autos. -- 42yrs virtualization experience (since Jan68), online at home since Mar1970
From: Anne & Lynn Wheeler on 21 Mar 2010 17:41
re: http://www.garlic.com/~lynn/2010f.html#47 Nonlinear systems and nonlocal supercomputing http://www.garlic.com/~lynn/2010f.html#48 Nonlinear systems and nonlocal supercomputing in fact, one of the reason for doing (rios) cluster scaleup ... was at the time, there was no cache consistency support to allow doing anything at all with SCI (the only scaleup was cluster). the engineering manager that we reported to (when starting cluster scaleup) ... had only recently moved over to head up the new somerset organization (motorola, ibm, apple, etc) ... which would do a single-chip 801/risc and eventually produce something that had any kind of cache-consistency primitives for any kind of shared memory operations. but by that time any kind of cache consistency support existed, we were long gone. he does later show up as president of mips for a stint ... and we do some stuff. -- 42yrs virtualization experience (since Jan68), online at home since Mar1970 |