Prev: staging/rtl8192u: Add select WEXT_PRIV to Kconfig to prevent build failure
Next: MXC: input: add mxc-keypad driver to support the keypad interface present in the mxc application processors family.
From: Wu Fengguang on 8 Jan 2010 08:10 On Fri, Jan 08, 2010 at 11:32:07AM +0800, Zheng, Shaohui wrote: > Resend the patch to the mailing-list, the original patch URL is > http://patchwork.kernel.org/patch/69075/, it is not accepted without comments, > sent it again to review. > > Memory-Hotplug: Fix the bug on interface /dev/mem for 64-bit kernel > > The new added memory can not be access by interface /dev/mem, because we do not > update the variable high_memory. This patch add a new e820 entry in e820 table, > and update max_pfn, max_low_pfn and high_memory. > > We add a function update_pfn in file arch/x86/mm/init.c to udpate these > varibles. Memory hotplug does not make sense on 32-bit kernel, so we did not > concern it in this function. > > Signed-off-by: Shaohui Zheng <shaohui.zheng(a)intel.com> > CC: Andi Kleen <ak(a)linux.intel.com> > CC: Wu Fengguang <fengguang.wu(a)intel.com> > CC: Li Haicheng <Haicheng.li(a)intel.com> > > --- > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > index f50447d..b986246 100644 > --- a/arch/x86/kernel/e820.c > +++ b/arch/x86/kernel/e820.c > @@ -110,8 +110,8 @@ int __init e820_all_mapped(u64 start, u64 end, unsigned type) > /* > * Add a memory region to the kernel e820 map. > */ > -static void __init __e820_add_region(struct e820map *e820x, u64 start, u64 size, > - int type) > +static void __meminit __e820_add_region(struct e820map *e820x, u64 start, > + u64 size, int type) > { > int x = e820x->nr_map; > > @@ -126,7 +126,7 @@ static void __init __e820_add_region(struct e820map *e820x, u64 start, u64 size, > e820x->nr_map++; > } > > -void __init e820_add_region(u64 start, u64 size, int type) > +void __meminit e820_add_region(u64 start, u64 size, int type) > { > __e820_add_region(&e820, start, size, type); > } > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index d406c52..0474459 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -1,6 +1,7 @@ > #include <linux/initrd.h> > #include <linux/ioport.h> > #include <linux/swap.h> > +#include <linux/bootmem.h> > > #include <asm/cacheflush.h> > #include <asm/e820.h> > @@ -386,3 +387,30 @@ void free_initrd_mem(unsigned long start, unsigned long end) > free_init_pages("initrd memory", start, end); > } > #endif > + > +/** > + * After memory hotplug, the variable max_pfn, max_low_pfn and high_memory will > + * be affected, it will be updated in this function. Memory hotplug does not > + * make sense on 32-bit kernel, so we do did not concern it in this function. > + */ > +void __meminit __attribute__((weak)) update_pfn(u64 start, u64 size) > +{ > +#ifdef CONFIG_X86_64 > + unsigned long limit_low_pfn = 1UL<<(32 - PAGE_SHIFT); > + unsigned long start_pfn = start >> PAGE_SHIFT; > + unsigned long end_pfn = (start + size) >> PAGE_SHIFT; Strictly speaking, should use "end_pfn = PFN_UP(start + size);". > + if (end_pfn > max_pfn) { > + max_pfn = end_pfn; > + high_memory = (void *)__va(max_pfn * PAGE_SIZE - 1) + 1; > + } > + > + /* if add to low memory, update max_low_pfn */ > + if (unlikely(start_pfn < limit_low_pfn)) { > + if (end_pfn <= limit_low_pfn) > + max_low_pfn = end_pfn; > + else > + max_low_pfn = limit_low_pfn; X86_64 actually always set max_low_pfn=max_pfn, in setup_arch(): 899 #ifdef CONFIG_X86_64 900 if (max_pfn > max_low_pfn) { 901 max_pfn_mapped = init_memory_mapping(1UL<<32, 902 max_pfn<<PAGE_SHIFT); 903 /* can we preseve max_low_pfn ?*/ 904 max_low_pfn = max_pfn; 905 } 906 #endif max_low_pfn is used in - e820_mark_nosave_regions(max_low_pfn); - dump_pagetable() - blk_queue_bounce_limit() - increase_reservation() and _seems_ to mean "end of direct addressable pfn". > + } > +#endif /* CONFIG_X86_64 */ > +} > diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h > index b10ec49..6693414 100644 > --- a/include/linux/bootmem.h > +++ b/include/linux/bootmem.h > @@ -13,6 +13,7 @@ > > extern unsigned long max_low_pfn; > extern unsigned long min_low_pfn; > +extern void update_pfn(u64 start, u64 size); > > /* > * highest page > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 030ce8a..ee7b2d6 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -523,6 +523,14 @@ int __ref add_memory(int nid, u64 start, u64 size) > BUG_ON(ret); > } > > + /* update e820 table */ This comment can be eliminated - you already have the very readable printk :) > + printk(KERN_INFO "Adding memory region to e820 table (start:%016Lx, size:%016Lx).\n", > + (unsigned long long)start, (unsigned long long)size); > + e820_add_region(start, size, E820_RAM); > + /* update max_pfn, max_low_pfn and high_memory */ > + update_pfn(start, size); How about renaming function to update_end_of_memory_vars()? Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Zheng, Shaohui on 10 Jan 2010 21:30 Thanks Fengguang, see and comments in the email. Only a few different understanding on variable max_low_pfn. Thanks & Regards, Shaohui -----Original Message----- From: Wu, Fengguang Sent: Friday, January 08, 2010 8:49 PM To: Zheng, Shaohui Cc: linux-mm(a)kvack.org; akpm(a)linux-foundation.org; linux-kernel(a)vger.kernel.org; ak(a)linux.intel.com; y-goto(a)jp.fujitsu.com; Dave Hansen; x86(a)kernel.org; KAMEZAWA Hiroyuki Subject: Re: [PATCH - resend] Memory-Hotplug: Fix the bug on interface /dev/mem for 64-bit kernel(v1) On Fri, Jan 08, 2010 at 11:32:07AM +0800, Zheng, Shaohui wrote: > Resend the patch to the mailing-list, the original patch URL is > http://patchwork.kernel.org/patch/69075/, it is not accepted without comments, > sent it again to review. > > Memory-Hotplug: Fix the bug on interface /dev/mem for 64-bit kernel > > The new added memory can not be access by interface /dev/mem, because we do not > update the variable high_memory. This patch add a new e820 entry in e820 table, > and update max_pfn, max_low_pfn and high_memory. > > We add a function update_pfn in file arch/x86/mm/init.c to udpate these > varibles. Memory hotplug does not make sense on 32-bit kernel, so we did not > concern it in this function. > > Signed-off-by: Shaohui Zheng <shaohui.zheng(a)intel.com> > CC: Andi Kleen <ak(a)linux.intel.com> > CC: Wu Fengguang <fengguang.wu(a)intel.com> > CC: Li Haicheng <Haicheng.li(a)intel.com> > > --- > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > index f50447d..b986246 100644 > --- a/arch/x86/kernel/e820.c > +++ b/arch/x86/kernel/e820.c > @@ -110,8 +110,8 @@ int __init e820_all_mapped(u64 start, u64 end, unsigned type) > /* > * Add a memory region to the kernel e820 map. > */ > -static void __init __e820_add_region(struct e820map *e820x, u64 start, u64 size, > - int type) > +static void __meminit __e820_add_region(struct e820map *e820x, u64 start, > + u64 size, int type) > { > int x = e820x->nr_map; > > @@ -126,7 +126,7 @@ static void __init __e820_add_region(struct e820map *e820x, u64 start, u64 size, > e820x->nr_map++; > } > > -void __init e820_add_region(u64 start, u64 size, int type) > +void __meminit e820_add_region(u64 start, u64 size, int type) > { > __e820_add_region(&e820, start, size, type); > } > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index d406c52..0474459 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -1,6 +1,7 @@ > #include <linux/initrd.h> > #include <linux/ioport.h> > #include <linux/swap.h> > +#include <linux/bootmem.h> > > #include <asm/cacheflush.h> > #include <asm/e820.h> > @@ -386,3 +387,30 @@ void free_initrd_mem(unsigned long start, unsigned long end) > free_init_pages("initrd memory", start, end); > } > #endif > + > +/** > + * After memory hotplug, the variable max_pfn, max_low_pfn and high_memory will > + * be affected, it will be updated in this function. Memory hotplug does not > + * make sense on 32-bit kernel, so we do did not concern it in this function. > + */ > +void __meminit __attribute__((weak)) update_pfn(u64 start, u64 size) > +{ > +#ifdef CONFIG_X86_64 > + unsigned long limit_low_pfn = 1UL<<(32 - PAGE_SHIFT); > + unsigned long start_pfn = start >> PAGE_SHIFT; > + unsigned long end_pfn = (start + size) >> PAGE_SHIFT; Strictly speaking, should use "end_pfn = PFN_UP(start + size);". [Zheng, Shaohui] I will use PFN_UP to replace it in new version. > + if (end_pfn > max_pfn) { > + max_pfn = end_pfn; > + high_memory = (void *)__va(max_pfn * PAGE_SIZE - 1) + 1; > + } > + > + /* if add to low memory, update max_low_pfn */ > + if (unlikely(start_pfn < limit_low_pfn)) { > + if (end_pfn <= limit_low_pfn) > + max_low_pfn = end_pfn; > + else > + max_low_pfn = limit_low_pfn; X86_64 actually always set max_low_pfn=max_pfn, in setup_arch(): [Zheng, Shaohui] there should be some misunderstanding, I read the code carefully, if the total memory is under 4G, it always max_low_pfn=max_pfn. If the total memory is larger than 4G, max_low_pfn means the end of low ram. It set max_low_pfn = e820_end_of_low_ram_pfn();. 899 #ifdef CONFIG_X86_64 900 if (max_pfn > max_low_pfn) { 901 max_pfn_mapped = init_memory_mapping(1UL<<32, 902 max_pfn<<PAGE_SHIFT); 903 /* can we preseve max_low_pfn ?*/ 904 max_low_pfn = max_pfn; 905 } 906 #endif max_low_pfn is used in - e820_mark_nosave_regions(max_low_pfn); - dump_pagetable() - blk_queue_bounce_limit() - increase_reservation() and _seems_ to mean "end of direct addressable pfn". > + } > +#endif /* CONFIG_X86_64 */ > +} > diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h > index b10ec49..6693414 100644 > --- a/include/linux/bootmem.h > +++ b/include/linux/bootmem.h > @@ -13,6 +13,7 @@ > > extern unsigned long max_low_pfn; > extern unsigned long min_low_pfn; > +extern void update_pfn(u64 start, u64 size); > > /* > * highest page > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 030ce8a..ee7b2d6 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -523,6 +523,14 @@ int __ref add_memory(int nid, u64 start, u64 size) > BUG_ON(ret); > } > > + /* update e820 table */ This comment can be eliminated - you already have the very readable printk :) [Zheng, Shaohui] I will remove this comment > + printk(KERN_INFO "Adding memory region to e820 table (start:%016Lx, size:%016Lx).\n", > + (unsigned long long)start, (unsigned long long)size); > + e820_add_region(start, size, E820_RAM); > + /* update max_pfn, max_low_pfn and high_memory */ > + update_pfn(start, size); How about renaming function to update_end_of_memory_vars()? [Zheng, Shaohui] Agree. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on 11 Jan 2010 07:50 > > + /* if add to low memory, update max_low_pfn */ > > + if (unlikely(start_pfn < limit_low_pfn)) { > > + if (end_pfn <= limit_low_pfn) > > + max_low_pfn = end_pfn; > > + else > > + max_low_pfn = limit_low_pfn; > > X86_64 actually always set max_low_pfn=max_pfn, in setup_arch(): > [Zheng, Shaohui] there should be some misunderstanding, I read the > code carefully, if the total memory is under 4G, it always > max_low_pfn=max_pfn. If the total memory is larger than 4G, > max_low_pfn means the end of low ram. It set > max_low_pfn = e820_end_of_low_ram_pfn();. The above line is very misleading.. In setup_arch(), it will be overrode by the following block. > 899 #ifdef CONFIG_X86_64 > 900 if (max_pfn > max_low_pfn) { > 901 max_pfn_mapped = init_memory_mapping(1UL<<32, > 902 max_pfn<<PAGE_SHIFT); > 903 /* can we preseve max_low_pfn ?*/ > 904 max_low_pfn = max_pfn; > 905 } > 906 #endif Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on 11 Jan 2010 21:40 On Tue, Jan 12, 2010 at 08:30:31AM +0800, KAMEZAWA Hiroyuki wrote: > On Mon, 11 Jan 2010 20:43:03 +0800 > Wu Fengguang <fengguang.wu(a)intel.com> wrote: > > > > > + /* if add to low memory, update max_low_pfn */ > > > > + if (unlikely(start_pfn < limit_low_pfn)) { > > > > + if (end_pfn <= limit_low_pfn) > > > > + max_low_pfn = end_pfn; > > > > + else > > > > + max_low_pfn = limit_low_pfn; > > > > > > X86_64 actually always set max_low_pfn=max_pfn, in setup_arch(): > > > [Zheng, Shaohui] there should be some misunderstanding, I read the > > > code carefully, if the total memory is under 4G, it always > > > max_low_pfn=max_pfn. If the total memory is larger than 4G, > > > max_low_pfn means the end of low ram. It set > > > > > max_low_pfn = e820_end_of_low_ram_pfn();. > > > > The above line is very misleading.. In setup_arch(), it will be > > overrode by the following block. > > > > Hmmm....could you rewrite /dev/mem to use kernel/resource.c other than > modifing e820 maps. ? > Two reasons. > - e820map is considerted to be stable, read-only after boot. > - We don't need to add more x86 special codes. Sure, here it is :) --- x86: use the generic page_is_ram() The generic resource based page_is_ram() works better with memory hotplug/hotremove. So switch the x86 e820map based code to it. CC: Andi Kleen <andi(a)firstfloor.org> CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com> --- arch/x86/include/asm/page_types.h | 1 arch/x86/mm/ioremap.c | 37 ---------------------------- kernel/resource.c | 17 ++++++++++++ 3 files changed, 17 insertions(+), 38 deletions(-) --- linux-mm.orig/arch/x86/include/asm/page_types.h 2010-01-12 10:31:01.000000000 +0800 +++ linux-mm/arch/x86/include/asm/page_types.h 2010-01-12 10:31:44.000000000 +0800 @@ -34,19 +34,18 @@ #ifdef CONFIG_X86_64 #include <asm/page_64_types.h> #else #include <asm/page_32_types.h> #endif /* CONFIG_X86_64 */ #ifndef __ASSEMBLY__ -extern int page_is_ram(unsigned long pagenr); extern int devmem_is_allowed(unsigned long pagenr); extern unsigned long max_low_pfn_mapped; extern unsigned long max_pfn_mapped; extern unsigned long init_memory_mapping(unsigned long start, unsigned long end); extern void initmem_init(unsigned long start_pfn, unsigned long end_pfn, --- linux-mm.orig/arch/x86/mm/ioremap.c 2010-01-12 10:31:01.000000000 +0800 +++ linux-mm/arch/x86/mm/ioremap.c 2010-01-12 10:31:44.000000000 +0800 @@ -18,55 +18,18 @@ #include <asm/e820.h> #include <asm/fixmap.h> #include <asm/pgtable.h> #include <asm/tlbflush.h> #include <asm/pgalloc.h> #include <asm/pat.h> #include "physaddr.h" -int page_is_ram(unsigned long pagenr) -{ - resource_size_t addr, end; - int i; - - /* - * A special case is the first 4Kb of memory; - * This is a BIOS owned area, not kernel ram, but generally - * not listed as such in the E820 table. - */ - if (pagenr == 0) - return 0; - - /* - * Second special case: Some BIOSen report the PC BIOS - * area (640->1Mb) as ram even though it is not. - */ - if (pagenr >= (BIOS_BEGIN >> PAGE_SHIFT) && - pagenr < (BIOS_END >> PAGE_SHIFT)) - return 0; - - for (i = 0; i < e820.nr_map; i++) { - /* - * Not usable memory: - */ - if (e820.map[i].type != E820_RAM) - continue; - addr = (e820.map[i].addr + PAGE_SIZE-1) >> PAGE_SHIFT; - end = (e820.map[i].addr + e820.map[i].size) >> PAGE_SHIFT; - - - if ((pagenr >= addr) && (pagenr < end)) - return 1; - } - return 0; -} - /* * Fix up the linear direct mapping of the kernel to avoid cache attribute * conflicts. */ int ioremap_change_attr(unsigned long vaddr, unsigned long size, unsigned long prot_val) { unsigned long nrpages = size >> PAGE_SHIFT; int err; --- linux-mm.orig/kernel/resource.c 2010-01-12 10:31:01.000000000 +0800 +++ linux-mm/kernel/resource.c 2010-01-12 10:31:44.000000000 +0800 @@ -298,18 +298,35 @@ int walk_system_ram_range(unsigned long #endif static int __is_ram(unsigned long pfn, unsigned long nr_pages, void *arg) { return 24; } int __attribute__((weak)) page_is_ram(unsigned long pfn) { +#ifdef CONFIG_X86 + /* + * A special case is the first 4Kb of memory; + * This is a BIOS owned area, not kernel ram, but generally + * not listed as such in the E820 table. + */ + if (pfn == 0) + return 0; + + /* + * Second special case: Some BIOSen report the PC BIOS + * area (640->1Mb) as ram even though it is not. + */ + if (pfn >= (BIOS_BEGIN >> PAGE_SHIFT) && + pfn < (BIOS_END >> PAGE_SHIFT)) + return 0; +#endif return 24 == walk_system_ram_range(pfn, 1, NULL, __is_ram); } /* * Find empty slot in the resource tree given range and alignment. */ static int find_resource(struct resource *root, struct resource *new, resource_size_t size, resource_size_t min, resource_size_t max, resource_size_t align, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on 11 Jan 2010 21:50
On Tue, Jan 12, 2010 at 09:50:12AM +0800, KAMEZAWA Hiroyuki wrote: > Just an information. > > We already check kenerke/resource.c's resource information, here. > > read_mem() > -> range_is_allowed() > -> devmem_is_allowd() > -> iomem_is_exclusive() > > extra calls of page_is_ram() to ask architecture's map seems redundunt. > > But, I know PPC guys doesn't use ioresource.c, hehe. Another exception is !CONFIG_STRICT_DEVMEM, which makes range_is_allowed()==1. So we still need the page_is_ram() :) Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |