From: James Harris on 2 Aug 2010 16:37 On 2 Aug, 16:27, MitchAlsup <MitchAl...(a)aol.com> wrote: .... > Andy covered most of the cases: I will cover another (not pertinate to > x86s that I know of): Thanks to you and Andy for all the info! but I was just looking for some *examples* of architectures where the TLB caches not-present PTEs. The relevance to a software engineer is whether a potentially expensive TLB invalidation is needed when dealing with a page-not- present fault. An unneeded invalidation should be avoided due to its local and ongoing costs. It's clearly not needed on later Intel and AMD CPUs but what of earlier ones? What of Vax, Sparc, Mips, Alpha, Arm etc? (I should say I'm not asking about all of these. A couple of examples which cache not-present PTEs would be great.) I suspect Mips but was struggling to understand the Mips docs I have. Any ideas? James
From: Paul A. Clayton on 2 Aug 2010 22:04 On Aug 2, 4:37 pm, James Harris <james.harri...(a)googlemail.com> wrote: [snip] > The relevance to a software engineer is whether a potentially > expensive TLB invalidation is needed when dealing with a page-not- > present fault. An unneeded invalidation should be avoided due to its > local and ongoing costs. > > It's clearly not needed on later Intel and AMD CPUs but what of > earlier ones? What of Vax, Sparc, Mips, Alpha, Arm etc? (I should say > I'm not asking about all of these. A couple of examples which cache > not-present PTEs would be great.) I suspect Mips but was struggling to > understand the Mips docs I have. > > Any ideas? MIPS and Alpha both have software-controlled TLBs, though in the case of Alpha, Privileged Architecture Library code is executed and not a supervisor-level exception handler. From page B3-21 of ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition (section B3.3.4, page 1295 of the pdf): "Translation table entries that create Translation faults are not held in the TLB, see Translation fault on page B3-43. Therefore TLB and branch predictor invalidation is not required for the synchronization of a change from a translation table entry that causes a Translation fault to one that does not." (I.e., ARM does not load invalid PTEs into the TLB) For the UltraSPARC IIIi it seems that software TLB fill is used (UltraSPARC IIIi Processor Users Manual, page 192 [pdf page 238]): "When a non-faulting load encounters a TLB miss, the operating system should attempt to translate the page. If the translation results in an error, then zero is returned and the load completes silently." comp.arch.embedded might have some answers for this question. Paul A. Clayton just a technophile
From: Andy Glew "newsgroup at on 3 Aug 2010 03:42 On 8/2/2010 1:37 PM, James Harris wrote: > On 2 Aug, 16:27, MitchAlsup<MitchAl...(a)aol.com> wrote: > > ... > >> Andy covered most of the cases: I will cover another (not pertinate to >> x86s that I know of): > > Thanks to you and Andy for all the info! but I was just looking for > some *examples* of architectures where the TLB caches not-present > PTEs. > > The relevance to a software engineer is whether a potentially > expensive TLB invalidation is needed when dealing with a page-not- > present fault. An unneeded invalidation should be avoided due to its > local and ongoing costs. > > It's clearly not needed on later Intel and AMD CPUs but what of > earlier ones? What of Vax, Sparc, Mips, Alpha, Arm etc? (I should say > I'm not asking about all of these. A couple of examples which cache > not-present PTEs would be great.) I suspect Mips but was struggling to > understand the Mips docs I have. > > Any ideas? > > James If ever you see flakey results, on x86 or elsewhere I would strongly suggest that you have your invalid page exception handler rewalk the page tables to see if the page is, indeed, invalid. -- Or maybe if you are just paranoid. === Heck: if you yourself can rewalk the page tables, on all machines you can avoid the "expensive TLB invalidation".
From: Piotr Wyderski on 3 Aug 2010 07:55 Andy Glew wrote: > Heck: if you yourself can rewalk the page tables, on all machines you > can avoid the "expensive TLB invalidation". On the other hand, why is the TLB invalidation expensive? There are two ways to do it, the first is via invlpg and the other is to write to cr3. But both if them should be relatively cheap, i.e. wait until the LSU pipe is empty and then pulse a global edge/level reset line of the TLB subsystem. Why isn't the reality as simple as that? Best regards Piotr Wyderski
From: Terje Mathisen "terje.mathisen at on 3 Aug 2010 08:37
Piotr Wyderski wrote: > Andy Glew wrote: > >> Heck: if you yourself can rewalk the page tables, on all machines you >> can avoid the "expensive TLB invalidation". > > On the other hand, why is the TLB invalidation expensive? > There are two ways to do it, the first is via invlpg and the > other is to write to cr3. But both if them should be relatively cheap, > i.e. wait until the LSU pipe is empty and then pulse > a global edge/level reset line of the TLB subsystem. Why > isn't the reality as simple as that? Ouch. Writing to CR3 to invalidate the entire TLB subsystem is _very_ expensive: Not because the operations itself takes so long, but because you have to reload the 90+% of data which is still needed. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching" |