Prev: PEEEEEEP
Next: Texture units as a general function
From: Anne & Lynn Wheeler on 24 Dec 2009 14:00 re: http://www.garlic.com/~lynn/2009s.html#18 Larrabee delayed: anyone know what's happening? http://www.garlic.com/~lynn/2009s.html#20 Larrabee delayed: anyone know what's happening? http://www.garlic.com/~lynn/2009s.html#22 Larrabee delayed: anyone know what's happening? some one of the first SANs was at NCAR with pool of IBM CKD dasd, an IBM 43xx (midrange) mainframe, some number of "supercomputers", and HYPERCHannel. all the processors could message each other over HYPERChannel and also access the disks. The IBM mainframe acted as SAN controller ... getting requests (over hyperchannel) for data ... potentially have to first stage it from tape to disk ... using real channel connectivity to ibm disks. ibm disk controllers had multiple channel connectivity ... at least one to the "real" ibm channel and one to the HYPERChannel remote device adapter, emulated channel. The A515 was an upgraded remote device adapter that had capability of downloading both the full channel program into local memory ... as well as support for the dasd seek/search arguments into local memory (could distinquish between address references for the seek/search arguments in local memory via-a-vis the read/write transfers that involved "host" memory addresses. the ibm mainframe would load the channel program (to satisfy the data request, from some supercomputer) into the memory of the A515 ... and then respond to the requesting supercomputer with the "handle" of the channel program in one of the A515s. The supercomputer would then make a request to that A515 for the execution of that channel program .... transferring the data directly to the supercomputer ... w/o having to go thru the ibm mainframe memory ... basically "control" went thru ibm mainframe ... but actual data transfer was direct. later, there was standardization work on HIPPI (and FCS) switches to allow definition of something that would simulate the NCAR HYPERchannel environment and the ability to do "3rd party transfers" ... directly between processors and disks ... w/o having to involve the control machine (setting it all up) in the actual data flow. -- 40+yrs virtualization experience (since Jan68), online at home since Mar1970
From: Terje Mathisen "terje.mathisen at on 24 Dec 2009 15:45 Andy "Krazy" Glew wrote: > Terje Mathisen wrote: >>> Since the whole point of this exercise is to try to reduce the overhead >>> of cache coherency, but people have demonstrated they don't like the >>> consequences semantically, I am trying a different combination: allow A, >>> multiple values; allow B weak ordering; but disallow C losing writes. >>> >>> I possibly that this may be more acceptable and fewer bugs. >>> >>> I.e. I am suspecting that full cache coherency is overkill, but that >>> completely eliminating cache coherency is underkill. >> >> I agree, and I think most programmers will be happy with word-size >> tracking, i.e. we assume all char/byte operations happens on private >> memory ranges. > > Doesn't that fly in the face of the Alpha experience, where originally > they did not have byte memory operations, but were eventually forced to? > > Why? What changed? What is different? What's different is easy: I am not proposing we get rid of byte-sized memory operations, "only" that we don't promise they will be globally consistent, i.e. you only use 8/16-bit operations on private memory blocks. > > Some of the Alpha people have said that the biggest reason was I/O > devices from PC-land. Sounds like a special case. > > I suspect that there is at least some user level parallel code that > assumes byte writes are - what is the proper term? Atomic? Non-lossy? > Not implemented via a non-atomic RMW? There might be some such code somewhere, but only by accident, I don't believe anyone is using it intentionally: You want semaphores to be separated at least by a cache line if you care about performance, but I guess it is conceivable some old coder decided to pack all his lock variables into a single byte range. > > Can we get away with having byte and other sub-word writes? Saying that > they may be atomic/non-lossy in cache memory, but not in uncached remote > memory. But that word writes are non-lossy in all memory types? Or do we > need to having explicit control? I think so. >> Seems to work better than your wan-based tablet posts! > > Did you mean "van"? Are you using handwriting or speech recognition? :-) I'm using "non-native language" human spell checking. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: "Andy "Krazy" Glew" on 24 Dec 2009 16:26 Robert Myers wrote: > I can't see anything about channels that you can't do with modern PC I/O. AFAIK there isn't much that IBM mainframe channels could do that modern PC I/O controllers cannot do. Even a decade ago I saw SCSI controllers that were more sophisticated than IBM channels. If anything, the problem is that there are too many different PC I/O controllers with similar, but slightly different, capabilities. Perhaps the biggest thing IBM channels had (actually, still have) going for them is that they are reasonably standard. They are sold by IBM, not a plethora of I/O device vendors. They can interface to many I/O devices. You could write channel programs without too much fear of getting lockind in to a particular device (although, of course, you were largely locked in to IBM). Plus, of course, IBM channels were fairly well implemented. From time to time Intel tried to create its own generical channel controllers. Even back in the 80s. But, unfortunately, sometimes it was a net performance loss to use these devices, particularly for latency sensitive applications.
From: Bernd Paysan on 24 Dec 2009 17:16 Andy "Krazy" Glew wrote: > However, I think that these problems, although solvable, are the > reason why specialized processors, such as channel processors or > active messaging, have not dominated: > > a) security issues > > b) portability issues. > > I suspect the latter are worst. If you write code for a specialised > SCSI or IPI channel processor (see, these things are not unknown > outside the world of mainframes), you are locked in. The latter is obvious. When you want a successful active messaging system, you must solve the portability problem. Either by having an industry standard instruction set (like x86 is for PCs), or (IMHO ways better) by using source code. You may want to send actual source code around (as my friend does), or you may want to compile from source before you start. There are intermediate forms like tokenized source or virtual machines; they have their place, but are of more limited interest (if you care about bandwidth, compress your source; a dictionary based system effectively tokenizes). In general, I would tend to send actual source code around when the throughput and latency is relatively high compared to the speed of the nodes, and when the nodes are very heterogeneous. Well-known and widely used example: JavaScript. Sending source around doesn't mean interpreters: It is better to use incremental compilers (example: Forth, recent JavaScript engines, OpenCL). Using source also doesn't necessarily mean "send text strings around". In a sufficiently homogeneous environment, you can pre-compile all sources, and then send the actual binaries. Example: OpenCL. Your program is distributed as source, and compiled at run-time - but only once. A compromise for speed could be to send only "stored procedures" around as source, and the actual invocations (with a limited set of instructions) as interpreted virtual machine code. Using an event- driven paradigm (similar to HDLs like Verilog or VHDL) can reduce actual invocation code considerably. E.g. you only store a procedure in your node once, and then send data to the node - the node will trigger on the data arrival, and execute the code bound to that event. Maybe this could be what's different now: We now know that portability matters, and we understand much better how to achieve it. And finally, Merry Christmas, especially to the other people here - like Terje - who celebrate according to the Jewish calendar (where the day ends at dusk, and therefore Christmas Eve *is* already Christmas). -- Bernd Paysan "If you want it done right, you have to do it yourself" http://www.jwdt.com/~paysan/
From: Bill Todd on 24 Dec 2009 18:43
Andy "Krazy" Glew wrote: > Robert Myers wrote: >> I can't see anything about channels that you can't do with modern PC I/O. > > AFAIK there isn't much that IBM mainframe channels could do that modern > PC I/O controllers cannot do. Even a decade ago I saw SCSI controllers > that were more sophisticated than IBM channels. I may be missing something glaringly obvious here, but my impression is that the main thing that channels can do that PC I/O controllers can't is accept programs that allow them to operate extensively on the data they access. For example, one logical extension of this could be to implement an entire database management system in the channel controller - something which I'm reasonably sure most PC I/O controllers would have difficulty doing (not that I'm necessarily holding this up as a good idea...). PC I/O controllers have gotten very good at the basic drudge work of data access (even RAID), and ancillary DMA engines have added capabilities like scatter/gather - all tasks which used to be done in the host unless you had something like a channel controller to off-load them. But AFAIK channels they ain't. - bill |