Prev: PEEEEEEP
Next: Texture units as a general function
From: Del Cecchi on 23 Dec 2009 21:02 "Robert Myers" <rbmyersusa(a)gmail.com> wrote in message news:ab08929b-50c1-4f4b-8708-f878caa1c641(a)s31g2000yqs.googlegroups.com... On Dec 23, 12:21 pm, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: > Bernd Paysan wrote: > > Sending chunks of code around which are automatically executed by > > the > > receiver is called "active messages". I not only like the idea, a > > friend of mine has done that successfully for decades (the > > messages in > > question were Forth source - it was a quite high level of active > > messages). Doing that in the memory controller looks like a good > > idea > > for me, too, at least for that kind of code a memory controller > > can > > handle. The good thing about this is that you can collect all your > > "orders", and send them in one go - this removes a lot of latency, > > especially if your commands can include something like > > compare&swap or > > even a complete "insert into list/hash table" (that, unlike > > compare&swap, won't fail). > > Why do a feel that this feels a lot like IBM mainframe channel > programs? > :-) Could I persuade you to take time away from your first love (programming your own computers, of course) to elaborate/pontificate a bit? After forty years, I'm still waiting for someone to tell me something interesting about mainframes. Well, other than that IBM bet big and won big on them. And CHANNELS. Well. That's clearly like the number 42. Robert. --------------------------------------------------- Tell you something interesting about mainframes? If you can't find something you aren't trying. A channel is a specialized processor. The way I/O works on a 360 or follow on is that the main processor puts together a program that tells this specialized processor what to do and then issues a start I/O instruction that points the channel processor at the program. Perhaps you need to take a look at one of the freely available principles of operations manuals. I used to think Beemers were insular but I have decided that non-beemers are just as bad only inverse. The above is an architecture description by a circuit designer so take with grain of salt. del
From: Anne & Lynn Wheeler on 23 Dec 2009 21:07 Robert Myers <rbmyersusa(a)gmail.com> writes: > What bothers me is the "it's already been thought of" > > You worked with a different (and harsh) set of constraints. > > The contstraints are different now. Lots of resources free that once > were expensive. Don't want just a walk down memory lane. The world > is going to change, believe me. Anyone here interested in seeing > how? > > What can we know from the hard lessons your learned. That's a good > question. What's different now. That's a good question, too. > Everything is the same except the time scale. That answer requires a > detailed defense, and I think it's wrong. Sorry, Terje. re: http://www.garlic.com/~lynn/2009s.html#18 Larrabee delayed: anyone know what's happening? concurrent with fiber channel work was SCI ... sci was going after asyncronous packetized SCSI commands ... akin to fiber channel and serial-copper ... but also went after asyncronous packetized memory bus. the SCI asyncronous packetized memory bus was used by convex for exemplar, sequent for numa-q ... DG near its end did something akin to numa-q ... SGI also did flavor. part of the current issue is that oldtime real storage & paging latency to disk (in terms of count of processor cycles) ... is compareable to current cache sizes and cache miss latency to main memory. i had started in mid-70s saying that major system bottleneck was shifting from disk/file i/o to memory. in the early 90s ... the executives in the disk division took exception with some of my statements that relative system disk thruput had declined by an order of magnitude over a period of 15 years (cpu & storage resources increased by factor of 50, disk thruput increased by factor of 3-5) ... they assigned the division performance group to refute my statements .... after a couple weeks they came back and effectively said that I had understated the situation. part of this was from some work i had done as undergraduate in the 60s on dynamic adaptive resource management ... and "scheduling to the bottleneck" (it was frequently referred to as "fair share" scheduling .... since the default policy was "fair share") ... dynamically attempting to adjust resource management to system thruput bottleneck .... required being able to dynamically attempting to recognize where the bottlenecks were. misc. past posts mentioning dynamic adaptive resource managerment (and "fair share" scheduling) http://www.garlic.com/~lynn/subtopic.html#fairshare when i was doing hsdt ... some of the links were satellite ... and I had to redo had the satellite communication operated. a couple years later there was presentation at IETF meeting with presentation that mentioned cross-country fiber gigabit bandwidth*latency product ... it turned out the product was about the same was the product I had dealt with for high-speed (geo-sync) satellite (latency was much larger while the bandwidth was somewhat smaller ... but the resulting product was similar). there are still not a whole lot of applications that actually do coast-to-coast full(-duplex) gigabit operation (full concurrent gigabit in both directions). -- 40+yrs virtualization experience (since Jan68), online at home since Mar1970
From: Del Cecchi on 23 Dec 2009 21:12 "Robert Myers" <rbmyersusa(a)gmail.com> wrote in message news:22d9b7d3-1570-4b63-a32b-2addf533ef8a(a)v13g2000yqk.googlegroups.com... On Dec 23, 7:57 pm, Anne & Lynn Wheeler <l...(a)garlic.com> wrote: > Terje Mathisen <"terje.mathisen at tmsw.no"> writes: > > > Why do a feel that this feels a lot like IBM mainframe channel > > programs? > > :-) > > downside was that mainframe channel programs were half-duplex > end-to-end > serialization. there were all sorts of heat & churn in fiber-channel > standardization with the efforts to overlay mainframe channel > program > (half-duplex, end-to-end serialization) paradigm on underlying > full-duplex asynchronous operation. > > from the days of scarce, very expensive electronic storage > ... especially disk channel programs ... used "self-modifying" > operation > ... i.e. read operation would fetch the argument used by the > following > channel command (both specifying the same real address). couple > round > trips of this end-to-end serialization potentially happening over > 400' > channel cable within small part of disk rotation. > > trying to get a HYPERChannel "remote device adapter" (simulated > mainframe channel) working at extended distances with disk > controller & > drives ... took a lot of slight of hand. a copy of the > completedmainframe channel program was created and downloaded into > the > memory of the remote device adapter .... to minimize the > command-to-command latency. the problem was that some of the disk > command arguments had very tight latencies ... and so those > arguments > had to be recognized and also downloaded into the remote device > adapter > memory (and the related commands redone to fetch/store to the local > adapter memory rather than the remote mainframe memory). this > process > was never extended to be able to handle the "self-modifying" > sequences. > > on the other hand ... there was a serial-copper disk project that > effectively packetized SCSI commands ... sent them down outgoing > link ... and allowed asynchronous return on the incoming link > ... eliminating loads of the scsi latency. we tried to get this > morphed > into interoperating with fiber-channel standard ... but it morphed > into > SSA instead. > > -- > 40+yrs virtualization experience (since Jan68), online at home since > Mar1970 What bothers me is the "it's already been thought of" You worked with a different (and harsh) set of constraints. The contstraints are different now. Lots of resources free that once were expensive. Don't want just a walk down memory lane. The world is going to change, believe me. Anyone here interested in seeing how? What can we know from the hard lessons your learned. That's a good question. What's different now. That's a good question, too. Everything is the same except the time scale. That answer requires a detailed defense, and I think it's wrong. Sorry, Terje. Robert. ------------------------------------------------------ OK how about work queues in InfiniBand? Most everything anyone thinks of has been thought of before. Not everything but most things. People have been making things with I/O processors for years. Data Flow, Active Messages etc etc. What is it you think is so new? I read your previous post but other than the mystery data going to mystery locations didn't quite understand what you were driving at. As for "what is interesting about mainframes", basically many things that the pc and microprocessor folks are coming up with was done first or looked at first by the mainframe folks, since they could afford it first. del
From: "Andy "Krazy" Glew" on 24 Dec 2009 00:17 Robert Myers wrote: > If you know the future (or the dataflow graph ahead of time), you can > assemble packets of whatever. Could be any piece of the problem: > code, data, meta-data, meta-code,... whatever, and send it off to some > location where it knows that the other pieces that are needed for that > piece of the problem will also arrive, pushed from who-cares-where. > When enough pieces are in hand to act on, the receiving location acts > on whatever pieces it can. When any piece of anything that can be > used elsewhere is finished, it is sent on to wherever. The only > requirement is that there is some agent like a DNS that can tell > pieces with particular characteristics the arbitrarily chosen > processors (or collections of processors) to which they should migrate > for further use, and that receiving agents are not required to do > anything but wait until they have enough information to act on, and > the packets themselves will inform the receiving agent what else is > needed for further action (but not where it can be found). Many > problems seem to disappear as if by magic: the need for instruction > and data prefetch (two separate prediction processes), latency issues, > need for cache, and the need to invent elaborate constraints on what > kinds of packets can be passed around, as the structure (and, in > effect, the programming language) can be completely ad hoc. > Concurrency doesn't even seem to be an issue. It's a bit like an > asynchronous processor, and it seems implementable in any circumstance > where a data-push model can be implemented. I just BSed on something very similar to this. We are both talking about building a large dataflow system. (Indeed, all of my career I have been building dataflow systems: OOO CPUs are just dataflow on a micro scale.) However, the problems don't all magically disappear. In the limit, such dataflow may be better. In practice, the relative overhead of transferring code around versus transferring data, and of determining when data is ready, matters. In practice, the amount of data fetched at an individual node, relative to the amount of computation, matters. In practice, thew routing table management and lookup, matters. This is why I have tried to flesh things out a bit more, although still at a very high level: big fat CPUs, processing elements for simple active messages out of buffered inputs, scatter/gather operations, and translation in the network. And these are just the implementation details. The real problem is that this is semantically exposed. We have created a dataflow system. For memory; albeit PGAS or SHMEM like memory. And dataflow, no matter how you gloss over it, does not really like stateful memory. Either we hide the fact that there really is memory back there (Haskell monads, anyone?), or there is another level of synchronization relating to when it is okay to overwrite a memory location. I vote for the latter.
From: "Andy "Krazy" Glew" on 24 Dec 2009 00:19
Bernd Paysan wrote: > Robert Myers wrote: > It has been tried and it works - you can find a number of papers about > active message passing from various universities. However, it seems to > be that most people try to implement some standard protocols like MPI on > top of it, so the benefits might be smaller than expected. And as Andy > already observed: Most people seem to be more comfortable with > sequential programming. Using such an active message system makes the > parallel programming quite explicit - you model a data flow graph, you > create packets with code and data, and so on. Another of the SC09 buzzwords was parallel scripting languages, and other infracstructure to do exactly this: connect chunks of sequential code up into dataflow graphs. Perhaps it is just at a certaim level of abstraction that we will do this explicitly. Let the compiler autoparallelize the small stuff. |