From: Minchan Kim on
On Tue, Jul 13, 2010 at 11:30:06AM +0200, Johannes Weiner wrote:
> On Tue, Jul 13, 2010 at 12:53:48AM +0900, Minchan Kim wrote:
> > Kukjin, Could you test below patch?
> > I don't have any sparsemem system. Sorry.
> >
> > -- CUT DOWN HERE --
> >
> > Kukjin reported oops happen while he change min_free_kbytes
> > http://www.spinics.net/lists/arm-kernel/msg92894.html
> > It happen by memory map on sparsemem.
> >
> > The system has a memory map following as.
> > section 0 section 1 section 2
> > 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
> > SECTION_SIZE_BITS 28(256M)
> >
> > It means section 0 is an incompletely filled section.
> > Nontheless, current pfn_valid of sparsemem checks pfn loosely.
> >
> > It checks only mem_section's validation.
> > So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
> > It's not what we want.
> >
> > The Following patch adds check valid pfn range check on pfn_valid of sparsemem.
>
> Look at the declaration of struct mem_section for a second. It is
> meant to partition address space uniformly into backed and unbacked
> areas.
>
> It comes with implicit size and offset information by means of
> SECTION_SIZE_BITS and the section's index in the section array.
>
> Now you are not okay with the _granularity_ but propose to change _the
> model_ by introducing a subsection within each section and at the same
> time make the concept of a section completely meaningless: its size
> becomes arbitrary and its associated mem_map and flags will apply to
> the subsection only.
>
> My question is: if the sections are not fine-grained enough, why not
> just make them?
>
> The biggest possible section size to describe the memory population on
> this machine accurately is 16M. Why not set SECTION_SIZE_BITS to 24?

You're right. AFAIK, Kukjin tried it but Russell and others rejected it.
Let's wrap it up.

First of all, Thanks for joining good discussion, Kame, Hannes, Mel and
Russell.

The system has following memory map.
0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
80M hole : 432M 256M 128M

1) FLATMEM
If it uses FLATMEM, it wastes 864(432M/512K) pages due to memmap on hole.
That's horrible.

2) SPARSEMEM(16M)
It makes 56 mem_sections. It costs 448(56 * 8)byte.
It doesn't make unused memmap. So good.

3) SPARSEMEM(256M)
It makes 3 mem_sections. It costs 24(3 * 8) byte.
And if we free unused memmap on 176M(256M - 80M), we can save 352 pages.

3 is best about memory usage. but for 3, we should check pfn_valid more tightly.
It can be checked by my patch. but mm guys didn't like it since it makes memory
model messy due to some funny architecture.(ie, sparsemem designed to not include
hole.) and it still has a problem if there is a hole in the middle of section.

3 is not a big deal than 2 about memory usage.
If the system use memory space fully(MAX_PHYSMEM_BITS 31), it just consumes
1024(128 * 8) byte. So now I think best solution is 2.

Russell. What do you think about it?

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dave Hansen on
On Wed, 2010-07-14 at 00:43 +0900, Minchan Kim wrote:
> 3 is not a big deal than 2 about memory usage.
> If the system use memory space fully(MAX_PHYSMEM_BITS 31), it just consumes
> 1024(128 * 8) byte. So now I think best solution is 2.
>
> Russell. What do you think about it?

I'm not Russell, but I'll tell you what I think. :)

Make the sections 16MB. You suggestion to add the start/end pfns
_doubles_ the size of the structure, and its size overhead. We have
systems with a pretty tremendous amount of memory with 16MB sections.

If you _really_ can't make the section size smaller, and the vast
majority of the sections are fully populated, you could hack something
in. We could, for instance, have a global list that's mostly readonly
which tells you which sections need to be have their sizes closely
inspected. That would work OK if, for instance, you only needed to
check a couple of memory sections in the system. It'll start to suck if
you made the lists very long.

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
Hi, Dave.

On Tue, Jul 13, 2010 at 09:35:33AM -0700, Dave Hansen wrote:
> On Wed, 2010-07-14 at 00:43 +0900, Minchan Kim wrote:
> > 3 is not a big deal than 2 about memory usage.
> > If the system use memory space fully(MAX_PHYSMEM_BITS 31), it just consumes
> > 1024(128 * 8) byte. So now I think best solution is 2.
> >
> > Russell. What do you think about it?
>
> I'm not Russell, but I'll tell you what I think. :)
>

No problem. :)

> Make the sections 16MB. You suggestion to add the start/end pfns

I hope so.

> _doubles_ the size of the structure, and its size overhead. We have
> systems with a pretty tremendous amount of memory with 16MB sections.

Yes. it does in several GB server system.

>
> If you _really_ can't make the section size smaller, and the vast
> majority of the sections are fully populated, you could hack something
> in. We could, for instance, have a global list that's mostly readonly
> which tells you which sections need to be have their sizes closely
> inspected. That would work OK if, for instance, you only needed to
> check a couple of memory sections in the system. It'll start to suck if
> you made the lists very long.

Thanks for advise. As I say, I hope Russell accept 16M section.

>
> -- Dave
>

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Russell King - ARM Linux on
On Tue, Jul 13, 2010 at 05:02:22PM +0900, KAMEZAWA Hiroyuki wrote:
> How about stop using SPARSEMEM ? What's the benefit ? It just eats up
> memory for mem_section[].

The problem with that approach is that sometimes the mem_map array
doesn't fit into any memory banks.

We've gone around the loop of using flatmem with holes punched in it,
to using discontigmem, and now to using sparsemem. It seems none of
these solutions does what we need for ARM. I guess that's the price
we pay for not having memory architected to be at any particular place
in the physical memory map.

We're even seeing lately setups now where system memory is split into
two areas, where the second (higher physical address) is populated
first before the lower bank... These kinds of games are getting rather
stupid and idiotic, but we're not the hardware designers and so we have
to live with it - or just tell the folk who are porting the kernel to
these platforms that we'll never take their patches.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dave Hansen on
On Tue, 2010-07-13 at 19:39 +0100, Russell King - ARM Linux wrote:
> On Tue, Jul 13, 2010 at 05:02:22PM +0900, KAMEZAWA Hiroyuki wrote:
> > How about stop using SPARSEMEM ? What's the benefit ? It just eats up
> > memory for mem_section[].
>
> The problem with that approach is that sometimes the mem_map array
> doesn't fit into any memory banks.
>
> We've gone around the loop of using flatmem with holes punched in it,
> to using discontigmem, and now to using sparsemem. It seems none of
> these solutions does what we need for ARM. I guess that's the price
> we pay for not having memory architected to be at any particular place
> in the physical memory map.

What's the ARM hardware's maximum addressable memory these days? 4GB?

A 4GB system would have 256 sections, which means 256*2*sizeof(unsigned
long) for the mem_section[]. That's a pretty small amount of RAM.

What sizes are the holes that are being punched these days? Smaller
than 16MB?

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/