From: James Harris on 4 Aug 2010 17:23 On 3 Aug, 15:49, EricP <ThatWouldBeTell...(a)thevillage.com> wrote: > James Harris wrote: > > > I'm not sure I see the similarity to mfence but this branch of the > > thread has become x86-based so I'll carry on in that vein. > > http://developer.intel.com/products/processor/manuals/index.htm > > Intel manual 3A System Programming Guide, Part 1 (#253668) > Section 4.10 "CACHING TRANSLATION INFORMATION" covers TLB caching > (over 16 pages of info)http://developer.intel.com/Assets/PDF/manual/253668.pdf Well, it's not important but in the bit you quoted I wasn't asking what the TLB did but challenging Piotr's comment that TLB invalidation is "conceptually ... very similar to mfence." James
From: Rick Jones on 4 Aug 2010 17:48 Stephen Fuld <SFuld(a)alumni.cmu.edu.invalid> wrote: > How about a compromise where we just increase the page size? I know > of one system hat uses 16KB pages. This should reduce the number of > page faults, yet still require no application level changes and > allow for those few programs that really need large sparse address > spaces. There are processors which allow the page size to go to GBs in size, and even the odd OS that takes advantage of it, and the rest of the possible page sizes below it :) rick -- No need to believe in either side, or any side. There is no cause. There's only yourself. The belief is in your own precision. - Joubert these opinions are mine, all mine; HP might not want them anyway... :) feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
From: Andy Glew "newsgroup at on 4 Aug 2010 18:15 On 8/4/2010 10:49 AM, Nick Maclaren wrote: > In article<i3c4ni$m0r$1(a)news.eternal-september.org>, > Stephen Fuld<SFuld(a)Alumni.cmu.edu.invalid> wrote: >> >>>> This is a reason not to have the hardware or microcode rewalk the page >>>> tables when reporting a fault. Otherwise, you might end up having >>>> walked the page tables 3 times: >>>> >>>> First, the speculative TLB miss page walk by hardware. >>>> >>>> Second, the non-speculative TLB miss page walk by hardware (or >>>> microcode) when reporting the fault. >>>> >>>> Third, the page walk inside the OS page fault handler. >>> >>> It's an even better reason to abolish page faulting altogether! >>> As posted before, it would be trivial to do at the hardware level, >>> fairly easy to do at the software level, and seriously compromise >>> only a very few, very perverse usages. >>> >>> But it still rocks the boat too much to be considered nowadays :-( >> >> How about a compromise where we just increase the page size? I know of >> one system hat uses 16KB pages. This should reduce the number of page >> faults, yet still require no application level changes and allow for >> those few programs that really need large sparse address spaces. > > Not really, unfortunately, for two reasons. Firstly, most of the > benefit comes from abolishing the need for transparent fixup of > page faults. Secondly, increasing the page size often just increases > the memory requirements for sparse address spaces. > > It's trivial to do the calculation for random address distributions, > for many common ones, and the numbers are ugly - especially for the > simple case of UUID values. > > Perhaps the best argument against it is that it has been tried, many > times, and has failed every time (as a solution to this problem). > The systems that use large pages to tackle it usually use very large > ones (e.g. 4 MB) or variable ones, and use THEM to make certain > segments effectively immune from page faults. > >> But are page faults really a performance issue with today's larger memories? > > Yes. They often make it worse. The sole issue for many applications > is the proportion of their memory that can be mapped at any one time. > Consider any matrix method that has no known blocking form, and > necessarily uses accesses 'both ways round' closely together. As > soon as the matrix exceeds the size mapped by the TLB, there is a > BIG performance problem. I'm sure that somebody has beaten me to this, but, let me point out that this is NOT a performance problem caused by page faults. It is a performance problem caused by TLB misses. Page faults should be a much smaller performance problem. To a first order, paging from disk almost never happens, except as part of program startup or cold misses to a DLL. Probably the more common form of page fault occurs with OS mechanisms such as COW, Copy-On-Write.
From: Rick Jones on 4 Aug 2010 17:50 Nick Maclaren <nmm(a)gosset.csi.cam.ac.uk> wrote: > Not really, unfortunately, for two reasons. Firstly, most of the > benefit comes from abolishing the need for transparent fixup of page > faults. Secondly, increasing the page size often just increases the > memory requirements for sparse address spaces. Variable page size support perhaps? A platform near and dear to my paycheck can go from 4KB up through 4X multiples all the way to GB. rick jones -- portable adj, code that compiles under more than one compiler these opinions are mine, all mine; HP might not want them anyway... :) feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
From: MitchAlsup on 4 Aug 2010 19:25
On Aug 2, 11:30 am, "Paul A. Clayton" <paaronclay...(a)embarqmail.com> wrote: > On Aug 2, 11:27 am, MitchAlsup <MitchAl...(a)aol.com> wrote: > [snip] > > > Given the aforementioned organization, I know of no way to avoid > > inserting the invalid one in the PTE store while inserting the valid > > one (as a single word in the TLB). It is perfectly reasonable to avoid > > inserting the PTEs if ALL of them are invalid, but not if the one that > > took the miss is valid. > > Huh? Two methods are obvious to me. 1) Use the PTE valid bit and > have a policy that invalid PTEs are never cached in the TLB. > 2) Add a full/empty bit for each PTE slot in the TLB. But you have 2 (or 4) PTEs for one CAM entry. Thus 1,2,3 of them can be INVALID while the one you want is VALID. You cannot avoid storing some invalid PTEs. And The PTEs do have individual valid bits, but the problem was how not to store invalid PTEs in the TLB. IN the multi-PTE/store microarchitectures, you cannot avoid storing these. C-A-N-N-O-T Mitch |