From: KAMEZAWA Hiroyuki on 13 Jul 2010 04:10 On Tue, 13 Jul 2010 17:06:56 +0900 Minchan Kim <minchan.kim(a)gmail.com> wrote: > On Tue, Jul 13, 2010 at 3:40 PM, KAMEZAWA Hiroyuki > <kamezawa.hiroyu(a)jp.fujitsu.com> wrote: > > On Tue, 13 Jul 2010 15:04:00 +0900 > > Minchan Kim <minchan.kim(a)gmail.com> wrote: > > > >> >> > 2. This can't be help for a case where a section has multiple small holes. > >> >> > >> >> I agree. But this(not punched hole but not filled section problem) > >> >> isn't such case. But it would be better to handle it altogether. :) > >> >> > >> >> > > >> >> > Then, my proposal for HOLES_IN_MEMMAP sparsemem is below. > >> >> > == > >> >> > Some architectures unmap memmap[] for memory holes even with SPARSEMEM. > >> >> > To handle that, pfn_valid() should check there are really memmap or not. > >> >> > For that purpose, __get_user() can be used. > >> >> > >> >> Look at free_unused_memmap. We don't unmap pte of hole memmap. > >> >> Is __get_use effective, still? > >> >> > >> > __get_user() works with TLB and page table, the vaddr is really mapped or not. > >> > If you got SEGV, __get_user() returns -EFAULT. It works per page granule. > >> > >> I mean following as. > >> For example, there is a struct page in on 0x20000000. > >> > >> int pfn_valid_mapped(unsigned long pfn) > >> { > >> struct page *page = pfn_to_page(pfn); /* hole page is 0x2000000 */ > >> char *lastbyte = (char *)(page+1)-1; /* lastbyte is 0x2000001f */ > >> char byte; > >> > >> /* We pass this test since free_unused_memmap doesn't unmap pte */ > >> if(__get_user(byte, page) != 0) > >> return 0; > > > > why ? When the page size is 4096 byte. > > > > 0x1ffff000 - 0x1ffffffff > > 0x20000000 - 0x200000fff are on the same page. And memory is mapped per page. > > sizeof(struct page) is 32 byte. > So lastbyte is address of struct page + 32 byte - 1. > > > What we access by above __get_user() is a byte at [0x20000000, 0x20000001) > > Right. > > > and it's unmapped if 0x20000000 is unmapped. > > free_unused_memmap doesn't unmap pte although it returns the page to > free list of buddy. > ok, I understood. please see my latest mail and ignore all others. -kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Mel Gorman on 13 Jul 2010 05:40 On Tue, Jul 13, 2010 at 12:53:48AM +0900, Minchan Kim wrote: > Kukjin, Could you test below patch? > I don't have any sparsemem system. Sorry. > > -- CUT DOWN HERE -- > > Kukjin reported oops happen while he change min_free_kbytes > http://www.spinics.net/lists/arm-kernel/msg92894.html > It happen by memory map on sparsemem. > > The system has a memory map following as. > section 0 section 1 section 2 > 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000 > SECTION_SIZE_BITS 28(256M) > > It means section 0 is an incompletely filled section. > Nontheless, current pfn_valid of sparsemem checks pfn loosely. > > It checks only mem_section's validation. Note that this is meant to be fine as far as sparsemem is concerned. This is how it functions by design - if a section is valid, all pages within that section are valid. Hence, I consider the comment "sparsemem checks pfn loosely" unfair as it's as strong as required until someone breaks the model. It's ARM that is punching holes here and as the hole is at the beginning of a section, the zone bounaries could have been adjusted after the holes was punched. Anyway... > So in above case, pfn on 0x25000000 can pass pfn_valid's validation check. > It's not what we want. > > The Following patch adds check valid pfn range check on pfn_valid of sparsemem. > > Signed-off-by: Minchan Kim <minchan.kim(a)gmail.com> > Reported-by: Kukjin Kim <kgene.kim(a)samsung.com> > > P.S) > It is just RFC. If we agree with this, I will make the patch on mmotm. > > -- > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index b4d109e..6c2147a 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -979,6 +979,8 @@ struct mem_section { > struct page_cgroup *page_cgroup; > unsigned long pad; > #endif > + unsigned long start_pfn; > + unsigned long end_pfn; > }; ouch, 8 to 16 bytes increase on a commonly-used structure. Minimally, I would only expect this to be defined for CONFIG_ARCH_HAS_HOLES_MEMORYMODEL. Similarly for the rest of the patch, I'd expect the "strong" pfn_valid checks to only exist on ARM because nothing else needs it. > > #ifdef CONFIG_SPARSEMEM_EXTREME > @@ -1039,6 +1041,12 @@ static inline int valid_section(struct mem_section *section) > return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP)); > } > > +static inline int valid_section_pfn(struct mem_section *section, unsigned long pfn) > +{ > + return ((section && (section->section_mem_map & SECTION_HAS_MEM_MAP)) && > + (section->start_pfn <= pfn && pfn < section->end_pfn)); > +} > + > static inline int valid_section_nr(unsigned long nr) > { > return valid_section(__nr_to_section(nr)); > @@ -1053,7 +1061,7 @@ static inline int pfn_valid(unsigned long pfn) > { > if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) > return 0; > - return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); > + return valid_section_pfn(__nr_to_section(pfn_to_section_nr(pfn)), pfn); > } > > diff --git a/mm/sparse.c b/mm/sparse.c > index 95ac219..bde9090 100644 > --- a/mm/sparse.c > +++ b/mm/sparse.c > @@ -195,6 +195,8 @@ void __init memory_present(int nid, unsigned long start, unsigned long end) > if (!ms->section_mem_map) > ms->section_mem_map = sparse_encode_early_nid(nid) | > SECTION_MARKED_PRESENT; > + ms->start_pfn = start; > + ms->end_pfn = end; > } > } > I'd also expect the patch to then delete memmap_valid_within and replace it with a pfn_valid_within() check (which in turn should call the strong version of pfn_valid(). I prefer Kamezawa's suggestion of mapping on a ZERO_PAGE-like page full of PageReserved struct pages because it would have better performance and be more in line with maintaining the assumptions of the memory model. If we go in this direction, I would strongly prefer it was an ARM-only thing. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Johannes Weiner on 13 Jul 2010 05:40 On Tue, Jul 13, 2010 at 12:53:48AM +0900, Minchan Kim wrote: > Kukjin, Could you test below patch? > I don't have any sparsemem system. Sorry. > > -- CUT DOWN HERE -- > > Kukjin reported oops happen while he change min_free_kbytes > http://www.spinics.net/lists/arm-kernel/msg92894.html > It happen by memory map on sparsemem. > > The system has a memory map following as. > section 0 section 1 section 2 > 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000 > SECTION_SIZE_BITS 28(256M) > > It means section 0 is an incompletely filled section. > Nontheless, current pfn_valid of sparsemem checks pfn loosely. > > It checks only mem_section's validation. > So in above case, pfn on 0x25000000 can pass pfn_valid's validation check. > It's not what we want. > > The Following patch adds check valid pfn range check on pfn_valid of sparsemem. Look at the declaration of struct mem_section for a second. It is meant to partition address space uniformly into backed and unbacked areas. It comes with implicit size and offset information by means of SECTION_SIZE_BITS and the section's index in the section array. Now you are not okay with the _granularity_ but propose to change _the model_ by introducing a subsection within each section and at the same time make the concept of a section completely meaningless: its size becomes arbitrary and its associated mem_map and flags will apply to the subsection only. My question is: if the sections are not fine-grained enough, why not just make them? The biggest possible section size to describe the memory population on this machine accurately is 16M. Why not set SECTION_SIZE_BITS to 24? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Russell King - ARM Linux on 13 Jul 2010 05:50 On Tue, Jul 13, 2010 at 10:37:00AM +0100, Mel Gorman wrote: > I prefer Kamezawa's suggestion of mapping on a ZERO_PAGE-like page full > of PageReserved struct pages because it would have better performance > and be more in line with maintaining the assumptions of the memory > model. If we go in this direction, I would strongly prefer it was an > ARM-only thing. As I've said, this is not possible without doing some serious page manipulation. Plus the pages that where there become unusable as they don't correspond with a PFN or obey phys_to_virt(). So there's absolutely no point to this. Now, why do we free the holes in the mem_map - because these holes can be extremely large. Every 512K of hole equates to one page of mem_map array. Balance that against memory placed at 0xc0000000 physical on some platforms, and with PHYSMEM_BITS at 32 and SECTION_SIZE_BITS at 19 - well, you do the maths. The result is certainly not pretty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Mel Gorman on 13 Jul 2010 06:10
On Tue, Jul 13, 2010 at 10:46:12AM +0100, Russell King - ARM Linux wrote: > On Tue, Jul 13, 2010 at 10:37:00AM +0100, Mel Gorman wrote: > > I prefer Kamezawa's suggestion of mapping on a ZERO_PAGE-like page full > > of PageReserved struct pages because it would have better performance > > and be more in line with maintaining the assumptions of the memory > > model. If we go in this direction, I would strongly prefer it was an > > ARM-only thing. > > As I've said, this is not possible without doing some serious page > manipulation. > Yep, x86 used to do something like this for discontig. It wasn't pretty. > Plus the pages that where there become unusable as they don't correspond > with a PFN or obey phys_to_virt(). So there's absolutely no point to > this. > > Now, why do we free the holes in the mem_map - because these holes can > be extremely large. Every 512K of hole equates to one page of mem_map > array. Sure, the holes might be large but at least they are contiguous. Is there ever a case where you have 512K_Valid 512K_Hole 512K_Valid 512K_Hole or is it typically 512K_hole 512K_hole ...... 512K_Valid 512K_Valid etc If holes are typically contiguos, memmap is not allocated in the first place and the savings from punching holes in memmap in the latter configuration are minimal. I recognise if you have a 2M section with a hole in it, you are potentially wasting 3 pages on unused memmap. If this is really a problem, Minchan's modification to sparsemem to increase the size of mem_section on CONFIG_ARCH_HAS_HOLES_MEMORYMODEL is a messy option. I say messy because it only works if the hole is on either end of the section and it's adding quirks to the memory model. > Balance that against memory placed at 0xc0000000 physical on > some platforms, and with PHYSMEM_BITS at 32 and SECTION_SIZE_BITS at > 19 - well, you do the maths. The result is certainly not pretty. > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |