Prev: Multi-core lag for Left 4 Dead 1 and 2 and Quake 4 on AMD X23800+ processor... why ?
Next: Which is the most beautiful and memorable hardware structure in a CPU?
From: Morten Reistad on 2 Apr 2010 22:57 In article <693h87-kn7.ln1(a)ntp.tmsw.no>, Terje Mathisen <terje.mathisen(a)tmsw.no> wrote: >Morten Reistad wrote: >> Now, can we attack this from a simpler perspective; can we make >> the L2-memory interaction more intelligent? Like actually make >> a paging system for it? Paging revolutionised the disk-memory >> systems, remember? > >Morten, I've been preaching these equivalences for more than 5 years: > >Old Mainframe: cpu register -> memory -> disk -> tape >Modern micro: cpu register -> cache -> ram -> disk So have I. >Current cache-ram interfaces work in ~128-byte blocks, just like the >page size of some of the earliest machines with paging (PDP10/11 ?). > >RAM needs to be accessed in relatively large blocks, since the hardware >is optimized for sequential access. > >Current disks are of course completely equivalent to old tapes: Yes, it >is possible to seek randomly, but nothing but really large sequential >blocks will give you close to theoretical throughput. > >Tape is out of the question now simply because the time to do a disaster >recovery rollback of a medium-size (or larger) system is measured in >days or weeks, instead of a couple of hours. We do have the problem of software bloat, so the cache keeps thrashing. But we need to test out these ideas. So here is a suggestion: Have someone build a simplish, historical cpu that is reasonably amenable to a modern implementation. The PDP11 seems like a nice target. Then build an 8-way PDP11, including a few tens of megabytes of "L2 cache" ram on-chip, and have the memory interface look like a disk. Then fire up an old unix, and measure performance. Swapping to memory. It could also be an 80286. Or, give us an Xeon, where the MMU can be reconfigured to handle L2 cache as memory, and memory as disk. We could try som low footprint OS, like OpenBSD or QNX on that, and fire up some applications and look at the results. There was a revolution from 1969 onwards when Dennings paper[1] was implemented for paging instead of various other strategies for paging. I suspect re-applying this to the on-chip static memory vs off-chip, behind mmu dynamic memory would be a similar win. -- mrr [1] http://cs.gmu.edu/cne/pjd/PUBS/Workingsets.html
From: Morten Reistad on 2 Apr 2010 23:11 In article <hp5254$r1m$1(a)news.eternal-september.org>, Stephen Fuld <SFuld(a)Alumni.cmu.edu.invalid> wrote: >On 4/2/2010 5:07 AM, Terje Mathisen wrote: >> Morten Reistad wrote: >While this is all, at least sort of, true, the question is what do you >want to do about it. ISTM that the salient characteristics paging, i.e. >memory to disk, interface are that it requires OS interaction in order >to optimize, that the memory to disk interfaces have been getting >narrower (i.e. SATA, Fibre Channel and serial SCSI) not wider, and that >the CPU doesn't directly address the disk. Do you want to narrow the >CPU's addressing range to just include the cache? Do you want the >software to get involved in cache miss processing? I don't want to narrow the addressing range at all. You still have the same virtual addresses as before. It is just that we only use "L2 cache" (on-chip, fast, static-ish memory) as "memory", and that we use paging/swapping to address "main memory" (off-chip, high latency dynamic memory). And that we throw the established theory onto the problem of this bottleneck, and look what happens. User programs should run unmodified. We need to make some drivers for our common operating systems, and have some new hardware to support this. It should be doable with the correct mmu. We can then build "memory boxes" of L2 memory via hyperchannel or similar low-latency, fast interfaces, with a hundred meg or so per memory box. The code to handle all of this is still in the OS'es in the Open Source world, we just have to map the usage onto new hardware. So, we may yet see systems with 100k page faults per second. :-/ >This is all to say that, as usual, the devil is in the details. :-( We can build the system disks etc in memory too, but have to be careful about having persistent storage done right. If we think about this very carefully we can have snapshot states of the system committed to persistent storage at regular intervals, like every few seconds or so. -- mrr
From: Morten Reistad on 3 Apr 2010 00:45 In article <cf02372e-6462-4ef7-80e1-35996dba5bce(a)q15g2000yqj.googlegroups.com>, Robert Myers <rbmyersusa(a)gmail.com> wrote: >On Apr 2, 11:23�am, Stephen Fuld <SF...(a)alumni.cmu.edu.invalid> wrote: > >> While this is all, at least sort of, true, the question is what do you >> want to do about it. �ISTM that the salient characteristics paging, i.e. >> memory to disk, interface are that it requires OS interaction in order >> to optimize, that the memory to disk interfaces have been getting >> narrower (i.e. SATA, Fibre Channel and serial SCSI) not wider, and that >> the CPU doesn't directly address the disk. �Do you want to narrow the >> CPU's addressing range to just include the cache? �Do you want the >> software to get involved in cache miss processing? >> >> This is all to say that, as usual, the devil is in the details. �:-( > >On die memory isn't yet big enough to be fussing the details, although >I assume we will get there. > >The better model to look at might be graphics cards that carry a large >enough amount of super-fast memory to be interesting as a model for >general computation. I don't care what you call it, GPU, CPU, mill, whatever. We can run a decent system like Qnx, OpenBSD etc in about 6 megabytes of ram; except it will start to make page faults when you start to do something interesting. If the 6 mb (or 12, or 24) of ram are superfast, have a minimal mmu, at least capable of process isolation and address translation; and do the "paging" to main memory, then you could run one of these minimal, general purpose operating systems inside each gpu/cpu/whatever, and live with the page faults. It will be several orders of magnitude faster and lower latency than the swapping and paging we normally love to hate. We already have that "swapping", except we call it "memory access". The theory is old, stable and well validated. The code is done, and still in many operating systems. We "just need drivers". -- mrr
From: Terje Mathisen on 3 Apr 2010 04:53 Stephen Fuld wrote: > On 4/2/2010 5:07 AM, Terje Mathisen wrote: >> Old Mainframe: cpu register -> memory -> disk -> tape >> Modern micro: cpu register -> cache -> ram -> disk >> >> Current cache-ram interfaces work in ~128-byte blocks, just like the >> page size of some of the earliest machines with paging (PDP10/11 ?). >> >> RAM needs to be accessed in relatively large blocks, since the hardware >> is optimized for sequential access. >> >> Current disks are of course completely equivalent to old tapes: Yes, it >> is possible to seek randomly, but nothing but really large sequential >> blocks will give you close to theoretical throughput. >> >> Tape is out of the question now simply because the time to do a disaster >> recovery rollback of a medium-size (or larger) system is measured in >> days or weeks, instead of a couple of hours. > > While this is all, at least sort of, true, the question is what do you > want to do about it. ISTM that the salient characteristics paging, i.e. > memory to disk, interface are that it requires OS interaction in order > to optimize, that the memory to disk interfaces have been getting > narrower (i.e. SATA, Fibre Channel and serial SCSI) not wider, and that > the CPU doesn't directly address the disk. Do you want to narrow the > CPU's addressing range to just include the cache? Do you want the > software to get involved in cache miss processing? Not at all! I use my argument as a lead-in to tell programmers they had better study the algorithms developed for 30-40 year old mainframes with limited memory, because unless they can make their algorithms fit this model, performance will really suffer. I.e. I don't suggest they should try to do anything at the OS level, rather take the performance steps as a given and work around/within those limitations. > > This is all to say that, as usual, the devil is in the details. :-( Indeed. Terje
From: nmm1 on 3 Apr 2010 06:03
In article <u8cj87-q7e.ln1(a)ntp.tmsw.no>, Terje Mathisen <terje.mathisen(a)tmsw.no> wrote: >Stephen Fuld wrote: >> >> While this is all, at least sort of, true, the question is what do you >> want to do about it. ISTM that the salient characteristics paging, i.e. >> memory to disk, interface are that it requires OS interaction in order >> to optimize, that the memory to disk interfaces have been getting >> narrower (i.e. SATA, Fibre Channel and serial SCSI) not wider, and that >> the CPU doesn't directly address the disk. Do you want to narrow the >> CPU's addressing range to just include the cache? Do you want the >> software to get involved in cache miss processing? > >Not at all! > >I use my argument as a lead-in to tell programmers they had better study >the algorithms developed for 30-40 year old mainframes with limited >memory, because unless they can make their algorithms fit this model, >performance will really suffer. > >I.e. I don't suggest they should try to do anything at the OS level, >rather take the performance steps as a given and work around/within >those limitations. The thing that pisses me off is having to explain to them that they ALSO need to take account of the design deficiencies of the less clueful mainframe architectures and operating systems into account, because that is the level at which modern systems map to them :-( Stephen's points are a prime example of this. We learnt that that was NOT how to handle virtual memory back in the 1960s, but the new kid on the block (IBM mainframe division) wouldn't be told anything, and things have gone downhill from there :-( I keep being told that TLB misses aren't important, because modern TLBs are so large, and programmers don't need to know about memory banking designs. Yeah. Right. Now, back in the real world .... Regards, Nick Maclaren. |