From: Ankita Garg on 21 May 2010 05:40 Hi, On Thu, May 13, 2010 at 07:36:30PM +0800, Shaohui Zheng wrote: > Hi, All > This patchset introduces NUMA hotplug emulator for x86. it refers too > many files and might introduce new bugs, so we send a RFC to comminity first > and expect comments and suggestions, thanks. > <snip> > * Principles & Usages > > NUMA hotplug emulator include 3 different parts, We add a menu item to the > menuconfig to enable/disable them > (Refer to http://shaohui.org/images/hpe-krnl-cfg.jpg) > > > 1) Node hotplug emulation: > > The emulator firstly hides RAM via E820 table, and then it can > fake offlined nodes with the hidden RAM. > > After system bootup, user is able to hotplug-add these offlined > nodes, which is just similar to a real hotplug hardware behavior. > > Using boot option "numa=hide=N*size" to fake offlined nodes: > - N is the number of hidden nodes > - size is the memory size (in MB) per hidden node. > > There is a sysfs entry "probe" under /sys/devices/system/node/ for user > to hotplug the fake offlined nodes: > > - to show all fake offlined nodes: > $ cat /sys/devices/system/node/probe > > - to hotadd a fake offlined node, e.g. nodeid is N: > $ echo N > /sys/devices/system/node/probe > I tried the patchset on a non-NUMA machine. So, inorder to create fake NUMA nodes and be able to emulate the hotplug behavior, I used the following commandline: "numa=fake=4 numa=hide=2*2048" on a machine with 8G memory. I expected to see 4 nodes, out of which 2 would be hidden. However, the system comes up the 4 online nodes and 2 offline nodes (thus a total of 6 nodes). While we could decide this to be the semantics, however, I feel that numa=fake should define the total number of nodes. So in the above case, the system should have come up with 2 online nodes and 2 offline nodes. Also, "numa=hide=N" could also be supported, with the size of the hidden nodes being equal to the entire size of the node, with or without numa=fake parameter. On onlining one of the offline nodes, I see another issue that the memory under it is not automatically brought online. For example: #ls /sys/devices/system/node ..... node0 node1 node2.. #cat /sys/devices/system/node/probe 3 #echo 3 > /sys/devices/system/node/probe #ls /sys/devices/system/node ..... node0 node1 node2 node3 #cat /sys/devices/system/node/node3/meminfo Node 3 MemTotal: 0 kB Node 3 MemFree: 0 kB Node 3 MemUsed: 0 kB Node 3 Active: 0 kB ....... i.e, as memory-less nodes. However, these nodes were designated to have memory. So, on onlining the nodes, maybe we could have all their memory brought into online state as well ? -- Regards, Ankita Garg (ankita(a)in.ibm.com) Linux Technology Center IBM India Systems & Technology Labs, Bangalore, India -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ankita Garg on 21 May 2010 06:20 Hi, On Thu, May 13, 2010 at 07:48:35PM +0800, Shaohui Zheng wrote: > Userland interface to hotplug-add fake offlined nodes. > > Add a sysfs entry "probe" under /sys/devices/system/node/: > > - to show all fake offlined nodes: > $ cat /sys/devices/system/node/probe > > - to hotadd a fake offlined node, e.g. nodeid is N: > $ echo N > /sys/devices/system/node/probe > > Signed-off-by: Haicheng Li <haicheng.li(a)linux.intel.com> > Signed-off-by: Shaohui Zheng <shaohui.zheng(a)intel.com> > --- > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 9458685..2c078c8 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1214,6 +1214,20 @@ config NUMA_EMU > into virtual nodes when booted with "numa=fake=N", where N is the > number of nodes. This is only useful for debugging. > > +config NUMA_HOTPLUG_EMU > + bool "NUMA hotplug emulator" > + depends on X86_64 && NUMA && HOTPLUG > + ---help--- > + > +config NODE_HOTPLUG_EMU > + bool "Node hotplug emulation" > + depends on NUMA_HOTPLUG_EMU && MEMORY_HOTPLUG > + ---help--- > + Enable Node hotplug emulation. The machine will be setup with > + hidden virtual nodes when booted with "numa=hide=N*size", where > + N is the number of hidden nodes, size is the memory size per > + hidden node. This is only useful for debugging. > + The above dependencies do not work as expected. I could configure NUMA_HOTPLUG_EMU & NODE_HOTPLUG_EMU without having MEMORY_HOTPLUG turned on. By pushing the above definition below SPARSEMEM and memory hot add and remove, the dependencies could be sorted out. -- Regards, Ankita Garg (ankita(a)in.ibm.com) Linux Technology Center IBM India Systems & Technology Labs, Bangalore, India -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Shaohui Zheng on 23 May 2010 22:00 On Fri, May 21, 2010 at 03:41:04PM +0530, Ankita Garg wrote: > Hi, > > On Thu, May 13, 2010 at 08:00:16PM +0800, Shaohui Zheng wrote: > > hotplug emulator:extend memory probe interface to support NUMA > > > > Signed-off-by: Shaohui Zheng <shaohui.zheng(a)intel.com> > > Signed-off-by: Haicheng Li <haicheng.li(a)intel.com> > > Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com> > > --- > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index 54ccb0d..787024f 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -1239,6 +1239,17 @@ config ARCH_CPU_PROBE_RELEASE > > is for cpu hot-add/hot-remove to specified node in software method. > > This is for debuging and testing purpose > > > > +config ARCH_MEMORY_PROBE > > The above symbol exists already... Yes, we create CONFIG_NUMA_HOTPLUG_EMU, CONFIG_NODE_HOTPLUG_EMU and CONFIG_ARCH_CPU_PROBE_RELEASE options, and move CONFIG_ARCH_MEMORY_PROBE together with the above 3 options. > > > + def_bool y > > + bool "Memory hotplug emulation" > > + depends on NUMA_HOTPLUG_EMU > > + ---help--- > > + Enable memory hotplug emulation. Reserve memory with grub parameter > > + "mem=N"(such as mem=1024M), where N is the initial memory size, the > > + rest physical memory will be removed from e820 table; the memory probe > > + interface is for memory hot-add to specified node in software method. > > + This is for debuging and testing purpose > > + > > config NODES_SHIFT > > int "Maximum NUMA Nodes (as a power of 2)" if !MAXSMP > > range 1 10 > > > -- > Regards, > Ankita Garg (ankita(a)in.ibm.com) > Linux Technology Center > IBM India Systems & Technology Labs, > Bangalore, India -- Thanks & Regards, Shaohui -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Shaohui Zheng on 23 May 2010 22:00 On Fri, May 21, 2010 at 03:38:16PM +0530, Ankita Garg wrote: > Hi, > > On Thu, May 13, 2010 at 07:48:35PM +0800, Shaohui Zheng wrote: > > Userland interface to hotplug-add fake offlined nodes. > > > > Add a sysfs entry "probe" under /sys/devices/system/node/: > > > > - to show all fake offlined nodes: > > $ cat /sys/devices/system/node/probe > > > > - to hotadd a fake offlined node, e.g. nodeid is N: > > $ echo N > /sys/devices/system/node/probe > > > > Signed-off-by: Haicheng Li <haicheng.li(a)linux.intel.com> > > Signed-off-by: Shaohui Zheng <shaohui.zheng(a)intel.com> > > --- > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index 9458685..2c078c8 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -1214,6 +1214,20 @@ config NUMA_EMU > > into virtual nodes when booted with "numa=fake=N", where N is the > > number of nodes. This is only useful for debugging. > > > > +config NUMA_HOTPLUG_EMU > > + bool "NUMA hotplug emulator" > > + depends on X86_64 && NUMA && HOTPLUG > > + ---help--- > > + > > +config NODE_HOTPLUG_EMU > > + bool "Node hotplug emulation" > > + depends on NUMA_HOTPLUG_EMU && MEMORY_HOTPLUG > > + ---help--- > > + Enable Node hotplug emulation. The machine will be setup with > > + hidden virtual nodes when booted with "numa=hide=N*size", where > > + N is the number of hidden nodes, size is the memory size per > > + hidden node. This is only useful for debugging. > > + > > The above dependencies do not work as expected. I could configure > NUMA_HOTPLUG_EMU & NODE_HOTPLUG_EMU without having MEMORY_HOTPLUG > turned on. By pushing the above definition below SPARSEMEM and memory > hot add and remove, the dependencies could be sorted out. Ankita, The emulation code was tested by many times, but we did not try each combination for the new options, good catching. We will includes your suggestion in the formal patch. thanks so much. > > -- > Regards, > Ankita Garg (ankita(a)in.ibm.com) > Linux Technology Center > IBM India Systems & Technology Labs, > Bangalore, India -- Thanks & Regards, Shaohui -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Shaohui Zheng on 23 May 2010 22:20
On Fri, May 21, 2010 at 03:03:40PM +0530, Ankita Garg wrote: > > I tried the patchset on a non-NUMA machine. So, inorder to create fake > NUMA nodes and be able to emulate the hotplug behavior, I used the > following commandline: > > "numa=fake=4 numa=hide=2*2048" > > on a machine with 8G memory. I expected to see 4 nodes, out of which 2 > would be hidden. However, the system comes up the 4 online nodes and 2 > offline nodes (thus a total of 6 nodes). While we could decide this to > be the semantics, however, I feel that numa=fake should define the total > number of nodes. So in the above case, the system should have come up > with 2 online nodes and 2 offline nodes. Ankita, it is the expected result, NUMA_EMU and NUMA_HOTPLUG_EMU are 2 different features, there is no dependency between the 2 features. Even if you disable NUMA_EMU, the hotplug emualation still working, this implementatin reduces the dependency, it make things simple and easy to understand. You concern makes sense in semantices, but we do not pefer to combine 2 independent modules together. > > Also, "numa=hide=N" could also be supported, with the size > of the hidden nodes being equal to the entire size of the node, with or > without numa=fake parameter. > > On onlining one of the offline nodes, I see another issue that the > memory under it is not automatically brought online. For example: > > #ls /sys/devices/system/node > .... node0 node1 node2.. > > #cat /sys/devices/system/node/probe > 3 > > #echo 3 > /sys/devices/system/node/probe > #ls /sys/devices/system/node > .... node0 node1 node2 node3 > > #cat /sys/devices/system/node/node3/meminfo > Node 3 MemTotal: 0 kB > Node 3 MemFree: 0 kB > Node 3 MemUsed: 0 kB > Node 3 Active: 0 kB > ...... > > i.e, as memory-less nodes. However, these nodes were designated to have > memory. So, on onlining the nodes, maybe we could have all their memory > brought into online state as well ? it is the same result with the real implemetation for memory hotplug in linux kernel, when we hot-add physical memory into machine, the linux kernel create the memory entires and create the related data structure, but the OS will never online the memory, it should finish in user space. the node hotplug emulation and memory hotplug emualtioni feature follows up the same rules with the kernel. As we know, when we allocate memory from a memory-less node, it will cause a OOM issue, Some engineer is already focus on this bug. Because of the OOM issue can be reproduced with the hotplug emulator, it helps the engineer so much. This feature is flexible. As I know, Some OSV already online the hotplug memory automatically, if the mainline kernel decide do the same thing, we will change the related code, too. > > -- > Regards, > Ankita Garg (ankita(a)in.ibm.com) > Linux Technology Center > IBM India Systems & Technology Labs, > Bangalore, India -- Thanks & Regards, Shaohui -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |