Prev: vfs: only add " (deleted)" where necessary
Next: mips,kgdb: Individual register get/set for mips
From: Chris Webb on 2 Aug 2010 09:20 We run a number of relatively large x86-64 hosts with twenty or so qemu-kvm virtual machines on each of them, and I'm have some trouble with over-eager swapping on some (but not all) of the machines. This is resulting in customer reports of very poor response latency from the virtual machines which have been swapped out, despite the hosts apparently having large amounts of free memory, and running fine if swap is turned off. All of the hosts are running a 2.6.32.7 kernel and have ksm enabled with 32GB of RAM and 2x quad-core processors. There is a cluster of Xeon E5420 machines which apparently doesn't exhibit the problem, and a cluster of 2352/2378 Opteron (NUMA) machines, some of which do. The kernel config of the affected machines is at http://cdw.me.uk/tmp/config-2.6.32.7 This differs very little from the config on the unaffected Xeon machines, essentially just -CONFIG_MCORE2=y +CONFIG_MK8=y -CONFIG_X86_P6_NOP=y On a typical affected machine, the virtual machines and other processes would apparently leave around 5.5GB of RAM available for buffers, but the system seems to want to swap out 3GB of anonymous pages to give itself more like 9GB of buffers: # cat /proc/meminfo MemTotal: 33083420 kB MemFree: 693164 kB Buffers: 8834380 kB Cached: 11212 kB SwapCached: 1443524 kB Active: 21656844 kB Inactive: 8119352 kB Active(anon): 17203092 kB Inactive(anon): 3729032 kB Active(file): 4453752 kB Inactive(file): 4390320 kB Unevictable: 5472 kB Mlocked: 5472 kB SwapTotal: 25165816 kB SwapFree: 21854572 kB Dirty: 4300 kB Writeback: 4 kB AnonPages: 20780368 kB Mapped: 6056 kB Shmem: 56 kB Slab: 961512 kB SReclaimable: 438276 kB SUnreclaim: 523236 kB KernelStack: 10152 kB PageTables: 67176 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 41707524 kB Committed_AS: 39870868 kB VmallocTotal: 34359738367 kB VmallocUsed: 150880 kB VmallocChunk: 34342404996 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 5824 kB DirectMap2M: 3205120 kB DirectMap1G: 30408704 kB We see this despite the machine having vm.swappiness set to 0 in an attempt to skew the reclaim as far as possible in favour of releasing page cache instead of swapping anonymous pages. After running swapoff -a, the machine is immediately much healthier. Even while the swap is still being reduced, load goes down and response times in virtual machines are much improved. Once the swap is completely gone, there are still several gigabytes of RAM left free which are used for buffers, and the virtual machines are no longer laggy because they are no longer swapped out. Running swapon -a again, the affected machine waits for about a minute with zero swap in use, before the amount of swap in use very rapidly increases to around 2GB and then continues to increase more steadily to 3GB. We could run with these machines without swap (in the worst cases we're already doing so), but I'd prefer to have a reserve of swap available in case of genuine emergency. If it's a choice between swapping out a guest or oom-killing it, I'd prefer to swap... but I really don't want to swap out running virtual machines in order to have eight gigabytes of page cache instead of five! Is this a problem with the page reclaim priorities, or am I just tuning these hosts incorrectly? Is there more detailed info than /proc/meminfo available which might shed more light on what's going wrong here? Best wishes, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on 2 Aug 2010 20:00 On Mon, Aug 2, 2010 at 9:47 PM, Chris Webb <chris(a)arachsys.com> wrote: > We run a number of relatively large x86-64 hosts with twenty or so qemu-kvm > virtual machines on each of them, and I'm have some trouble with over-eager > swapping on some (but not all) of the machines. This is resulting in > customer reports of very poor response latency from the virtual machines > which have been swapped out, despite the hosts apparently having large > amounts of free memory, and running fine if swap is turned off. > > All of the hosts are running a 2.6.32.7 kernel and have ksm enabled with > 32GB of RAM and 2x quad-core processors. There is a cluster of Xeon E5420 > machines which apparently doesn't exhibit the problem, and a cluster of > 2352/2378 Opteron (NUMA) machines, some of which do. The kernel config of > the affected machines is at > > �http://cdw.me.uk/tmp/config-2.6.32.7 > > This differs very little from the config on the unaffected Xeon machines, > essentially just > > �-CONFIG_MCORE2=y > �+CONFIG_MK8=y > �-CONFIG_X86_P6_NOP=y > > On a typical affected machine, the virtual machines and other processes > would apparently leave around 5.5GB of RAM available for buffers, but the > system seems to want to swap out 3GB of anonymous pages to give itself more > like 9GB of buffers: > > �# cat /proc/meminfo > �MemTotal: � � � 33083420 kB > �MemFree: � � � � �693164 kB > �Buffers: � � � � 8834380 kB > �Cached: � � � � � �11212 kB > �SwapCached: � � �1443524 kB > �Active: � � � � 21656844 kB > �Inactive: � � � �8119352 kB > �Active(anon): � 17203092 kB > �Inactive(anon): �3729032 kB > �Active(file): � �4453752 kB > �Inactive(file): �4390320 kB > �Unevictable: � � � �5472 kB > �Mlocked: � � � � � �5472 kB > �SwapTotal: � � �25165816 kB > �SwapFree: � � � 21854572 kB > �Dirty: � � � � � � �4300 kB > �Writeback: � � � � � � 4 kB > �AnonPages: � � �20780368 kB > �Mapped: � � � � � � 6056 kB > �Shmem: � � � � � � � �56 kB > �Slab: � � � � � � 961512 kB > �SReclaimable: � � 438276 kB > �SUnreclaim: � � � 523236 kB > �KernelStack: � � � 10152 kB > �PageTables: � � � �67176 kB > �NFS_Unstable: � � � � �0 kB > �Bounce: � � � � � � � �0 kB > �WritebackTmp: � � � � �0 kB > �CommitLimit: � �41707524 kB > �Committed_AS: � 39870868 kB > �VmallocTotal: � 34359738367 kB > �VmallocUsed: � � �150880 kB > �VmallocChunk: � 34342404996 kB > �HardwareCorrupted: � � 0 kB > �HugePages_Total: � � � 0 > �HugePages_Free: � � � �0 > �HugePages_Rsvd: � � � �0 > �HugePages_Surp: � � � �0 > �Hugepagesize: � � � 2048 kB > �DirectMap4k: � � � �5824 kB > �DirectMap2M: � � 3205120 kB > �DirectMap1G: � �30408704 kB > > We see this despite the machine having vm.swappiness set to 0 in an attempt > to skew the reclaim as far as possible in favour of releasing page cache > instead of swapping anonymous pages. > Hmm, Strange. We reclaim only anon pages when the system has few page cache. (ie, file + free <= high_water_mark) But in your meminfo, your system has lots of page cache page. So It isn't likely. Another possibility is _zone_reclaim_ in NUMA. Your working set has many anonymous page. The zone_reclaim set priority to ZONE_RECLAIM_PRIORITY. It can make reclaim mode to lumpy so it can page out anon pages. Could you show me /proc/sys/vm/[zone_reclaim_mode/min_unmapped_ratio] ? -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Chris Webb on 2 Aug 2010 23:40 Minchan Kim <minchan.kim(a)gmail.com> writes: > Another possibility is _zone_reclaim_ in NUMA. > Your working set has many anonymous page. > > The zone_reclaim set priority to ZONE_RECLAIM_PRIORITY. > It can make reclaim mode to lumpy so it can page out anon pages. > > Could you show me /proc/sys/vm/[zone_reclaim_mode/min_unmapped_ratio] ? Sure, no problem. On the machine with the /proc/meminfo I showed earlier, these are # cat /proc/sys/vm/zone_reclaim_mode 0 # cat /proc/sys/vm/min_unmapped_ratio 1 I haven't changed either of these from the kernel default. Many thanks, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on 3 Aug 2010 00:10 On Tue, Aug 3, 2010 at 12:31 PM, Chris Webb <chris(a)arachsys.com> wrote: > Minchan Kim <minchan.kim(a)gmail.com> writes: > >> Another possibility is _zone_reclaim_ in NUMA. >> Your working set has many anonymous page. >> >> The zone_reclaim set priority to ZONE_RECLAIM_PRIORITY. >> It can make reclaim mode to lumpy so it can page out anon pages. >> >> Could you show me /proc/sys/vm/[zone_reclaim_mode/min_unmapped_ratio] ? > > Sure, no problem. On the machine with the /proc/meminfo I showed earlier, > these are > > �# cat /proc/sys/vm/zone_reclaim_mode > �0 > �# cat /proc/sys/vm/min_unmapped_ratio > �1 if zone_reclaim_mode is zero, it doesn't swap out anon_pages. 1) How does VM reclaim anonymous pages even though vm_swappiness == zero and has big page cache? 2) I doubt file pages of your system is fulled by Buffers while Cached is almost 10M. Why is it remained although anon pages is swapped out and cached page are reclaimed? Hmm. I have no idea. :( -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on 3 Aug 2010 00:30 On Tue, Aug 03, 2010 at 12:09:18PM +0800, Minchan Kim wrote: > On Tue, Aug 3, 2010 at 12:31 PM, Chris Webb <chris(a)arachsys.com> wrote: > > Minchan Kim <minchan.kim(a)gmail.com> writes: > > > >> Another possibility is _zone_reclaim_ in NUMA. > >> Your working set has many anonymous page. > >> > >> The zone_reclaim set priority to ZONE_RECLAIM_PRIORITY. > >> It can make reclaim mode to lumpy so it can page out anon pages. > >> > >> Could you show me /proc/sys/vm/[zone_reclaim_mode/min_unmapped_ratio] ? > > > > Sure, no problem. On the machine with the /proc/meminfo I showed earlier, > > these are > > > > # cat /proc/sys/vm/zone_reclaim_mode > > 0 > > # cat /proc/sys/vm/min_unmapped_ratio > > 1 > > if zone_reclaim_mode is zero, it doesn't swap out anon_pages. If there are lots of order-1 or higher allocations, anonymous pages will be randomly evicted, regardless of their LRU ages. This is probably another factor why the users claim. Are there easy ways to confirm this other than patching the kernel? Chris, what's in your /proc/slabinfo? Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Next
|
Last
Pages: 1 2 3 Prev: vfs: only add " (deleted)" where necessary Next: mips,kgdb: Individual register get/set for mips |