From: Minchan Kim on 30 Jul 2010 05:50 On Fri, Jul 30, 2010 at 9:38 AM, Dave Hansen <dave(a)linux.vnet.ibm.com> wrote: > On Thu, 2010-07-29 at 23:14 +0100, Russell King - ARM Linux wrote: >> What we need is something which allows us to handle memory scattered >> in several regions of the physical memory map, each bank being a >> variable size. > > Russell, it does sound like you have a pretty pathological case here. :) > It's not one that we've really attempted to address on any other > architectures. > > Just to spell it out, if you have 4GB of physical address space, with > 512k sections, you need 8192 sections, which means 8192*8 bytes, so it'd > eat 64k of memory. �That's the normal SPARSEMEM case. > > SPARSEMEM_EXTREME would be a bit different. �It's a 2-level lookup. > You'd have 16 "section roots", each representing 256MB of address space. > Each time we put memory under one of those roots, we'd fill in a > 512-section second-level table, which is designed to always fit into one > page. �If you start at 256MB, you won't waste all those entries. > > The disadvantage of SPARSEMEM_EXTREME is that it costs you the extra > level in the lookup. �The space loss in arm's case would only be 16 > pointers, which would more than be made up for by the other gains. > > The other case where it really makes no sense is when you're populating > a single (or small number) of sections, evenly across the address space. > For instance, let's say you have 16 512k banks, evenly spaced at 256MB > intervals: > > � � � �512k(a)0x00000000 > � � � �512k(a)0x10000000 > � � � �512k(a)0x20000000 > � � � �... > � � � �512k(a)0xF0000000 > > If you use SPARSEMEM_EXTREME on that it will degenerate to having the > same memory consumption as classic SPARSEMEM, along with the extra > lookup of EXTREME. �But, I haven't heard you say that you have this kind > of configuration, yet. :) > > SPARSEMEM_EXTREME is really easy to test. �You just have to set it in > your .config. �To get much use out of it, you'd also need to make the > SECTION_SIZE, like the 512k we were talking about. > Thanks for good explanation. When this problem happened, I suggested to use section size 16M. The space isn't a big cost but failed since Russell doesn't like it. So I tried to enhance sparsemem to support hole but you guys doesn't like it. Frankly speaking myself don't like this approach but I think whoever have to care of the problem. Hmm, Is it better to give up Samsung's good embedded board? It depends on Russell's opinion. I will hold this patch until reaching the conclusion of controversial discussion. Thanks, Dave. > -- Dave > > -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Lameter on 30 Jul 2010 08:50 On Thu, 29 Jul 2010, Dave Hansen wrote: > SPARSEMEM_EXTREME would be a bit different. It's a 2-level lookup. > You'd have 16 "section roots", each representing 256MB of address space. > Each time we put memory under one of those roots, we'd fill in a > 512-section second-level table, which is designed to always fit into one > page. If you start at 256MB, you won't waste all those entries. That is certain a solution to the !MMU case and it would work very much like a page table. If you have an MMU then the vmemmap sparsemem configuration can take advantage of of that to avoid the 2 level lookup. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Dave Hansen on 30 Jul 2010 11:50 On Fri, 2010-07-30 at 07:48 -0500, Christoph Lameter wrote: > On Thu, 29 Jul 2010, Dave Hansen wrote: > > > SPARSEMEM_EXTREME would be a bit different. It's a 2-level lookup. > > You'd have 16 "section roots", each representing 256MB of address space. > > Each time we put memory under one of those roots, we'd fill in a > > 512-section second-level table, which is designed to always fit into one > > page. If you start at 256MB, you won't waste all those entries. > > That is certain a solution to the !MMU case and it would work very much > like a page table. If you have an MMU then the vmemmap sparsemem > configuration can take advantage of of that to avoid the 2 level lookup. Yup, couldn't agree more, Christoph. It wouldn't hurt to have several them available on ARM since the architecture is so diverse. -- Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Russell King - ARM Linux on 31 Jul 2010 06:50 On Fri, Jul 30, 2010 at 06:32:04PM +0900, Minchan Kim wrote: > On Fri, Jul 30, 2010 at 5:55 AM, Dave Hansen <dave(a)linux.vnet.ibm.com> wrote: > > If you free up parts of the mem_map[] array, how does the buddy > > allocator still work? �I thought we required at 'struct page's to be > > contiguous and present for at least 2^MAX_ORDER-1 pages in one go. (Dave, I don't seem to have your mail to reply to.) What you say is correct, and memory banks as a rule of thumb tend to be powers of two. We do have the ability to change MAX_ORDER (which we need to do for some platforms where there's only 1MB of DMA-able memory.) However, in the case of two 512KB banks, the buddy allocator won't try to satisfy a 1MB request as it'll only have two separate 2x512K free 'pages' to deal with, and 0x1M free 'pages'. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Russell King - ARM Linux on 31 Jul 2010 11:40
On Fri, Jul 30, 2010 at 07:48:00AM -0500, Christoph Lameter wrote: > On Thu, 29 Jul 2010, Dave Hansen wrote: > > > SPARSEMEM_EXTREME would be a bit different. It's a 2-level lookup. > > You'd have 16 "section roots", each representing 256MB of address space. > > Each time we put memory under one of those roots, we'd fill in a > > 512-section second-level table, which is designed to always fit into one > > page. If you start at 256MB, you won't waste all those entries. > > That is certain a solution to the !MMU case and it would work very much > like a page table. If you have an MMU then the vmemmap sparsemem > configuration can take advantage of of that to avoid the 2 level lookup. Looking at vmemmap sparsemem, we need to fix it as the page table allocation in there bypasses the arch defined page table setup. This causes a problem if you have 256-entry L2 page tables with no room for the additional Linux VM PTE support bits (such as young, dirty, etc), and need to glue two 256-entry L2 hardware page tables plus a Linux version to store its accounting in each page. See arch/arm/include/asm/pgalloc.h. So this causes a problem with vmemmap: pte_t entry; void *p = vmemmap_alloc_block_buf(PAGE_SIZE, node); if (!p) return NULL; entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL); Are you willing for this stuff to be replaced by architectures as necessary? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |