Prev: PEEEEEEP
Next: Texture units as a general function
From: "Andy "Krazy" Glew" on 24 Dec 2009 10:00 Terje Mathisen wrote: >> Since the whole point of this exercise is to try to reduce the overhead >> of cache coherency, but people have demonstrated they don't like the >> consequences semantically, I am trying a different combination: allow A, >> multiple values; allow B weak ordering; but disallow C losing writes. >> >> I possibly that this may be more acceptable and fewer bugs. >> >> I.e. I am suspecting that full cache coherency is overkill, but that >> completely eliminating cache coherency is underkill. > > I agree, and I think most programmers will be happy with word-size > tracking, i.e. we assume all char/byte operations happens on private > memory ranges. Doesn't that fly in the face of the Alpha experience, where originally they did not have byte memory operations, but were eventually forced to? Why? What changed? What is different? Some of the Alpha people have said that the biggest reason was I/O devices from PC-land. Sounds like a special case. I suspect that there is at least some user level parallel code that assumes byte writes are - what is the proper term? Atomic? Non-lossy? Not implemented via a non-atomic RMW? Can we get away with having byte and other sub-word writes? Saying that they may be atomic/non-lossy in cache memory, but not in uncached remote memory. But that word writes are non-lossy in all memory types? Or do we need to having explicit control? >> * This * post, by the way, is composed almost exclusively by speech >> recognition, using the pen for certain trivial edits. It's nice to find >> a way that I can actually compose stuff on a plane again. > > Seems to work better than your wan-based tablet posts! Did you mean "van"? Are you using handwriting or speech recognition? :-) The big problem with these "natural user interfaces" is review. Making it easy to make sure that what was input was actually what was meant. Hmmm... === Happy Xmas, Terje!!!!!
From: "Andy "Krazy" Glew" on 24 Dec 2009 10:09 Terje Mathisen wrote: > Bernd Paysan wrote: >> Terje Mathisen<"terje.mathisen at tmsw.no"> wrote: >>> PS. This is my very first post from my personal leafnode installation: >>> I have free news access via my home (fiber) ISP, but not here in >>> Rauland on Christmas/New Year vacation, so today I finally broke down >>> and installed leafnode on my home FreeBSD gps-based ntp server. :-) >> >> I use leafnode locally for a decade or so now; it does a good job on >> message prefetching, and it also can be used to hide details like where >> my actual news feed is coming from. At the moment my newsreader (I switched back to Thunderbird from Seamonkey) is my only non-cloud based app. The only app that ties me to a particular machine, which I cannot use when that machine is not around. (And the laptop that I read news on is often physically turned off and not accessible to the net, so remote access is not a possibility.) I mean to install a newsreader on my cloud machine. Unfortunately, not allowed to use X. Probably will fall back to Emacs gnus, although that loses HTML and graphics, for newsgroups that aren't in the ascii stoneages like comp.arch. Then I will have remote access... IIRC, gnus will also get me offline access. --- I've been wondering about leafnode ever since Bernd told me about it years ago. But, I don't see any advantage for it on my cloud configuration. Do you? === comp.arch relevancy: not much, except that cloud apps are becoming more and more important, and this sort of issue, for our persona hacking, provides some insight.
From: "Andy "Krazy" Glew" on 24 Dec 2009 10:15 Terje Mathisen wrote: > I think active code/message passing/dataflow is the obvious direction of > all big systems, including everything that needs to run over the internet. > > After all, downloading java applets to the client that knows how to > handle the accompanying server data is one working exam Q: at what level should there be support? There have been past proposals to have active messaging supported in hardware. Not just network hardware, but also CPU hardware. It has been used as one of the arguments for Burton Smith style multithreading, with incoming active messages from the network automatically being allocated a thread slot, with the network carrying hardware comprehended privileges around. Or, as you say, active messaging already can be done in pure software. In some ways Java already is. (Or Forth.) Should we even bother providing any hardware support? --- Somebody used to say in his .sig "All good ideas eventually move from software to hardware." The history of active messages illustrates another historical pattern: sometimes ideas start off in hardware, move to software, and oscillate back and forth.
From: Anne & Lynn Wheeler on 24 Dec 2009 11:36 Terje Mathisen <"terje.mathisen at tmsw.no"> writes: > Del have already answered, but since I know far less than him about > IBM systems, I'll try anyway: > > As Del said, an IBM mainframe has lots of dedicated slave processors, > think of them as very generalized DMA engines where you can do stuff > like: > > seek to and read block # 48, load the word at offset 56 in that block > and compare with NULL: If equal return the block, otherwise use the > word at offset 52 as the new block number and repeat the process. > > I.e. you could implement most operations on most forms of diskbased > tree structures inside the channel cpu, with no need to interrupt the > host before everything was done. re: http://www.garlic.com/~lynn/2009s.html#18 Larrabee delayed: anyone know what's happening? http://www.garlic.com/~lynn/2009s.html#20 Larrabee delayed: anyone know what's happening? but it was all in main processor real storage ... so search operations that compared on something would be constantly be fetching the search argument from main memory. lots of latency and heavy load on path. frequently channel was supposed to have lots of concurrent activity .... but during a search operation ... the whole infrastructure was dedicated to that operation ... & locked out all other operations. Issue was that design point was from early 60s when I/O resources were really abundant and real-storage was very scarce. In the late 70s, I would periodically get called into customer situations (when everybody else had come up dry). late 70s, large national retailer ... several processors in loosely-coupled, shared-disk environment ... say half-dozen regional operations with processor complex per region ... but all sharing the same disk with application program library. program library was organized in something called PDS ... and PDS directory (of programs) was "scanned" with multi-track search for ever program load. this particular environment had a three "cylinder" PDS directory ... so avg. depth of search was 1.5 cylinders. This was 3330 drives that spun at 60 revs/sec and had 19 tracks per cylinder. The elapsed time for a multi-track search of whole cylinder ran 19/60s of a second ... during which time the device, (shared) device controler, and (shared) channel was unavailable for any other operations. Drive with the application library for the whole complex was peaking out at about six disk I/Os per second (2/3rds multi-track search of the library PDS directory and one disk I/O to load the actual program, peak maybe two program loads/sec). before I knew all this ... I'm brought into class room with six foot long student tables ... several of them covered with foot high piles of paper print outs of performance data from the half different systems. Basically print out for specific system with stats showing activity for 10-15 minute period (processor utilization, and i/o counts for individual disks, other stuff) ... for several days ... starting in the morning and continued during the day. Nothing stands out from their description ... just that thruput degrades enormously under peak load ... when the complex is attempting to do dozens of program loads/second across the whole operation). I effectively have to integrate the data from the different processor complex performance printouts in my head ... and then do the correlation that specific drive (out of dozens) is peaking at (aggregate) of 6-7 disk i/os per second (across all the processors) ... during periods of poor performance (takes 30-40 mins). I then get out of them that drive is the application program library for the whole complex with a three cylinder PDS directory. I then explain how PDS directory works with multi-track search ... and the whole complex is limited to two program loads/sec. The design trade-off was based on environment from the early 60s ... and was obsolete by the mid-70s ... when real-storage was starting to get abundant enough that the library directory could be cached in real storage ... and didn't have to do rescan on disk for every program load. lots of past posts mentioning CKD DASD (disk) should have moved away from multi-track search several decades ago http://www.garlic.com/~lynn/submain.html#dasd other posts about getting to play disk engineer in bldgs 14&15 http://www.garlic.com/~lynn/subtopic.html#disk the most famous was ISAM channel programs ... that would could go thru things like multi-level index ... with "self-modifying" ... where an operation would read into real storage ... the seek/search argument(s) for following channel commands (in the same channel program). ISAM resulted in heartburn for the real->virtual transition. Channel programs all involved "real" addresses. For virtual machine operation .... it required a complete scan of the "virtual" channel program and making a "shadow" ... that then had real addresses (in place of the virtual addresses), and executing the "shadow" program. Also seek arguments may need to be translated in the shadow (so the channel program that was actually being executed no longer referred to the address that the self-modifying arguments was happening). The old time batch, operating system ... with limited real-storage .... also had convention that the channel programs were built in the application space ... and passed to the kernel for execution. In their transition from real to virtual storage environment ... they found themselves faced with the same translation requirement faced by the virtual machine operating systems. In fact, they started out by borrowing the channel program translation routine from the virtual machine operating system. -- 40+yrs virtualization experience (since Jan68), online at home since Mar1970
From: Robert Myers on 24 Dec 2009 12:55
On Dec 24, 8:33 am, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: > Robert Myers wrote: > > On Dec 23, 12:21 pm, Terje Mathisen<"terje.mathisen at tmsw.no"> > >> Why do a feel that this feels a lot like IBM mainframe channel programs? > >> :-) > > > Could I persuade you to take time away from your first love > > (programming your own computers, of course) to elaborate/pontificate a > > bit? After forty years, I'm still waiting for someone to tell me > > something interesting about mainframes. Well, other than that IBM bet > > big and won big on them. > > > And CHANNELS. Well. That's clearly like the number 42. > > Del have already answered, but since I know far less than him about IBM > systems, I'll try anyway: > > As Del said, an IBM mainframe has lots of dedicated slave processors, > think of them as very generalized DMA engines where you can do stuff like: > > seek to and read block # 48, load the word at offset 56 in that block > and compare with NULL: If equal return the block, otherwise use the word > at offset 52 as the new block number and repeat the process. > > I.e. you could implement most operations on most forms of diskbased tree > structures inside the channel cpu, with no need to interrupt the host > before everything was done. > Thanks for that detailed reply, Terje. I tend to think there is something important I am still missing. Maybe my lack of appreciation of mainframes comes from never having been involved in the slightest with the history, except briefly as a frustrated and annoyed user. I can't see anything about channels that you can't do with modern PC I/ O. You send "stuff" to a peripheral device and it does something with it. It's generally up to the peripheral to know what to do with the "stuff," whatever it is. In the case of a TCP/IP offload engine, the what to do could be quite complicated. The things you *can't* do mostly seem to have to do with not doing I/O in userland, so that the programmability of anything is never exposed to you unless you are writing a driver. Is that the point? With a mainframe channel, a user *could* program the I/O devices, at least to some extent. Sorry if I seem obtuse. Robert. |