Prev: Multi-core lag for Left 4 Dead 1 and 2 and Quake 4 on AMD X23800+ processor... why ?
Next: Which is the most beautiful and memorable hardware structure in a CPU?
From: "Andy "Krazy" Glew" on 3 Apr 2010 13:49 On 4/3/2010 10:29 AM, nmm1(a)cam.ac.uk wrote: > What I don't understand is why everybody is so attached to demand > paging - it was near-essential in the 1970s, because memory was > very limited, but this is 35 years later! As far as I know, NONE > of the facilities that demand paging provides can't be done better, > and more simply, in other ways (given current constraints). I think my last post identified a slightly far-out reason: you can page without knowledge of software. But you can't swap. Paging (and other oblivious caching) has a more graceful fall-off. Note that I can't say truly graceful, because modern OSes don't page well. If you are doing whole program swapping, you can't run a program larger than memory With paging you can - although you better not be accessing a footprint larger than memory If you are doing segment swapping, and if data structures are constrained to be no larger than a segment, then you can't have a data structure larger than main memory. Or even approaching in size, since you will hae other segments. With paging you can. Sure, there have been compilers that automatically split up stuff like "char array[1024*1024*1024*16]" into multiple segments (where segments had a fixed maximum size). But that sure does look like paging. Just potentially with variable size pages. Hmmm... modern techniques like COW zero filled pages and forking work better with pages than with segments. You could COW whole segments. But it really is better to have a level underneath the segments. It could be segments and sub-segments, but it so often is segments and paging. And if you have paging, you don't need segments. Segments are associated far too often with restrictions such as "you can't have data structures bigger than 64K." Or 1G, Or 2^40 Such restrictions are the kiss of death. === Or are you talking not about pages vs. segments, but why bother with either?
From: nmm1 on 3 Apr 2010 14:29 In article <4BB77FAE.5030203(a)patten-glew.net>, Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote: > >> What I don't understand is why everybody is so attached to demand >> paging - it was near-essential in the 1970s, because memory was >> very limited, but this is 35 years later! As far as I know, NONE >> of the facilities that demand paging provides can't be done better, >> and more simply, in other ways (given current constraints). > >I think my last post identified a slightly far-out reason: you can page >without knowledge of software. But you can't swap. Hmm. Neither are entirely true - think of truly time-critical code. I have code that will work together with swapping, but not paging. >Paging (and other oblivious caching) has a more graceful fall-off. Note >that I can't say truly graceful, because modern OSes don't page well. Not in my experience, sorry. Even under MVS, which did, IBM learnt that paging was a disaster area for far too many uses and backed off to use swapping where at all possible. And "don't page well" often translates to "hang up and crash in truly horrible ways, often leaving filesystems corrupted." >If you are doing whole program swapping, you can't run a program larger >than memory With paging you can - although you better not be accessing a >footprint larger than memory > >If you are doing segment swapping, and if data structures are >constrained to be no larger than a segment, then you can't have a data >structure larger than main memory. Or even approaching in size, since >you will hae other segments. With paging you can. > >Sure, there have been compilers that automatically split up stuff like >"char array[1024*1024*1024*16]" into multiple segments (where segments >had a fixed maximum size). But that sure does look like paging. Just >potentially with variable size pages. All of the above is true, in theory, but I haven't even heard of a current example where those really matter, in practice - which isn't due to a gross (software) misdesign. The point is that paging works (in practice) only when your access is very sparse (either in space or time) AND when the sparse access is locally dense. I don't know of any uses where there are not much better solutions. >Hmmm... modern techniques like COW zero filled pages and forking work >better with pages than with segments. You could COW whole segments. >But it really is better to have a level underneath the segments. It >could be segments and sub-segments, but it so often is segments and >paging. And if you have paging, you don't need segments. Why? That's a serious question? See above. In any case, the fork-join model is a thoroughgoing misdesign for almost all purposes, and most COW uses are to kludge up its deficiencies. Please note that I was NOT proposing a hardware-only solution! >Segments are associated far too often with restrictions such as "you >can't have data structures bigger than 64K." Or 1G, Or 2^40 Such >restrictions are the kiss of death. I am not talking about such misdesigns, but something competently designed. After all, handling segments properly isn't exactly a new technology. Regards, Nick Maclaren.
From: Anne & Lynn Wheeler on 3 Apr 2010 16:09 i got called in to talk to some of the people regardubg mvt to initial virtual memory vs2/svs ... page replacement algorithms, local vs-a-vis global replacement, some other stuff. for some reason, they decided to make a micro trade-off ... selecting non-changed pages for replacement before changed pages (because they didn't have to write first ... the slot was immediately available) .... didn't matter how much argued with them ... they were determined to do it anyway. it wasn't until well into the vs2/mvs time-frame that somebody realized that they were replacing high-use shared executable library pages before private, lower-use data pages. recent thread mentioning old issue with local vis-a-vis global lru replacement: http://www.garlic.com/~lynn/2010f.html#85 16:32 far pointers in OpenWatcom C/C++ http://www.garlic.com/~lynn/2010f.html#89 16:32 far pointers in OpenWatcom C/C++ http://www.garlic.com/~lynn/2010f.html#90 16:32 far pointers in OpenWatcom C/C++ http://www.garlic.com/~lynn/2010f.html#91 16:32 far pointers in OpenWatcom C/C++ jim getting me into middle of academic dispute based on some work i had done as undergraduate in the 60s ... past reference http://www.garlic.com/~lynn/2006w.html#46 The Future of CPUs: What's After Multi-Core? in the 80s ... there was "big page" change for both mvs and vm ... sort of part-way swapping & demand page. at queue drop (aka swap) .... resident virtual pages would be formed into ten page collections and written as full-track 3380 disk write ... to the nearest free track arm position (moving cursor, slight similarity to log-structured file system). Subsequent page fault, if the page was part of a big-page, the whole big page would be brought in. the whole orientation was arm motion is the primary bottleneck with trading off both transfer capacity and real-storage against arm motion (fetch of ten pages might bring in pages that weren't actually needed the next time, the approach was slightly better than brute force move to 40k pages ... since it was some ten 4k pages that had actually been recently used ... as opposed to 40kbyte contiguous area). recent post mentioning big pages: http://www.garlic.com/~lynn/2010g.html#23 16:32 far pointers in OpenWatcom C/C++ -- 42yrs virtualization experience (since Jan68), online at home since Mar1970
From: Muzaffer Kal on 3 Apr 2010 16:46 On Sat, 3 Apr 2010 04:45:56 +0200, Morten Reistad <first(a)last.name> wrote: >In article <tpqar5pfc05ajvn3329an6v9tcodh58i65(a)4ax.com>, >Muzaffer Kal <kal(a)dspia.com> wrote: >>On Thu, 01 Apr 2010 22:48:47 -0500, Del Cecchi` <delcecchi(a)gmail.com> >>wrote: >> >>>MitchAlsup wrote: >>>> On Apr 1, 5:40 pm, timcaff...(a)aol.com (Tim McCaffrey) wrote: >>>> >>>>>The PCIe 2.0 links on the Clarkdale chips runs at 5G. >>>> >>>> >>>> Any how many dozen meters can these wires run? >>>> >>>> Mitch >>> >>>Maybe 1 dozen meters, depending on thickness of wire. Wire thickness >>>depends on how many you want to be able to put in a cable, and if you >>>want to be able to bend those cables. >>> >>>10GbaseT or whatever it is called gets 10 gbits/second over 4 twisted >>>pairs for 100 meters by what I classify as unnatural acts torturing the >>>bits. >> >>You should also add to it that this is full-duplex ie simultaneous >>transmission of 10G in both directions. One needs 4 equalizers, 4 echo >>cancellers, 12 NEXT and 12 FEXT cancellers in addition to a fully >>parallel LDPC decoder (don't even talk about the insane requirement on >>the clock recovery block). Over the last 5 years probably US$ 100M of >>VC money got spent to develop 10GBT PHYs with several startups >>disappearing with not much to show for. Torturing the bits indeed (not >>the mention torture of the engineers trying to make this thing work.) >>-- >>Muzaffer Kal > >Except for the optics, fiber equipment is a lot simpler. That reminds of the school principal who said "except for students, running schools is very easy." The nice thing about copper is that there are no optics. Once you get over the hurdle of making a PHY, you can make cables easily, bend them as you want/need (with in reason) and change connectivity as needed. That said, 10GBT is probably the last copper PHY for 100m cable. -- Muzaffer Kal DSPIA INC. ASIC/FPGA Design Services http://www.dspia.com
From: Morten Reistad on 4 Apr 2010 01:51
In article <4BB7763D.9020103(a)patten-glew.net>, Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote: >On 4/2/2010 9:45 PM, Morten Reistad wrote: >> If the 6 mb (or 12, or 24) of ram are superfast, have a minimal mmu, >> at least capable of process isolation and address translation; and do >> the "paging" to main memory, then you could run one of these minimal, >> general purpose operating systems inside each gpu/cpu/whatever, and >> live with the page faults. It will be several orders of magnitude >> faster and lower latency than the swapping and paging we normally love >> to hate. We already have that "swapping", except we call it "memory >> access". >> >> The theory is old, stable and well validated. The code is done, and >> still in many operating systems. We "just need drivers". > > >I know of at least one startup whose founder called me up asking for somebody able to >write such a driver and/or tune >the paging subsystem to create a hierarchy of ultra fast and conventional DRAM. > >In Linux. > >A few months later he was on to hardware state machines to do the paging. Linux >paging was too high overhead. I could have told you that. Linux has done integrated paging and file systems more or less straight from Tops20, or at least Denning's papers. I would try OpenBSD, which still has that ultratight code from the classic BSDs, or QNX; where they are proud to have short code paths. >Now, I suspect that a highly tuned paging system might be able to be 10X faster than >Linux's. (In the same way that >Linux ages much better than Windows.) (I hearken back to the day when people >actually cared about paging, e.g. at Gould.) > >But the lesson is that modern software, modern OSes, don't seem to page worth a damn. >And if you are thinkng to brush >off, re-tune, and completely reinvent virtual memory algoritms from the god old days >(paging andor swapping), you must >remember that the guys doing the hardware (and, more importantly, microcode and >firmware, which is just another form of >software) for such multilevel mmory systems have the same ideas and read the same papers. I am not advocating a "multilevel memory system"; if you think in the terms of NUMA etc. I am advocating utilising standard, 41 year old virtual memory techniques on the chip boundary. >It's just the usual: > >If you do it in software in the OS >- you have to do it in every OS >- you have to ship the new OS, or at least the new drivers >+ you can take advantage of more knowledge of software, processes, tasks >+ you can prototype quicker, since you have Linux sourcecode Or you can build yourself a hypervisor, like Apple has done; and have that present the "standard" x86 interface to the OS. And, as you say, once tables are set up the you can have hardware perform the vast bulk of paging activities. >If you do it in hardware/firmware/microcode >+ it works for every OS, including legacy OSes without device drivers >- you have less visibility to software constructs >+ if you are the hardware company, you can do it without OS (Microsoft OS) source code > >=== > >Hmmm... HW/FW/UC can only really do paging. Can't really do swapping. Swapping >requires SW knowledge. > >=== > >My take: if the "slower" memory is within 1,000 cycles, you almost definitely need a >hardware state machine Blok the >process (hardware thread), switch to another thread. The classic mainframe paging got underway when a disk access took somewhat more than 200 instructions to perform. Now it is main memory that takes a similar number of instructions to access. We have a problem with handling interrupts though; then an interrupt would cost ~20 instructions; now it costs several hundred. On a page fault we would normally want to schedule a read of the missing page into a vacant page slot, mark the faulting thread in IO wait, and pick the next process from the scheduler list to run. On IO complete we want to mark the page good, and accessed, put the thread back on the scheduler list as runnable; and possibly run it. These bits can be done in hardware by an MMU. But for a prototype we just need to generate a fault whenever the page is not in on-chip memory. >If within 10,000 cycles, I think hardware/firmware will still win. > >I think software OS level paging only is a clear win at >= 100,000 cycles. The performance problem is with the interrupt load. Apart from that we saw clear wins in paging at the quarter thousand cycle level. -- mrr |