Prev: trivial: capitalization and other minor fixes in pnp documentation
Next: nfs: clear_commit_release incorrectly handle truncated page
From: Alan Jenkins on 24 Feb 2010 15:20 On 2/24/10, KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote: > On Tue, 23 Feb 2010 22:13:56 +0100 > "Rafael J. Wysocki" <rjw(a)sisk.pl> wrote: > >> Well, it still looks like we're waiting for create_workqueue_thread() to >> return, which probably is trying to allocate memory for the thread >> structure. >> >> My guess is that the preallocated memory pages freed by >> free_unnecessary_pages() go into a place from where they cannot be taken >> for >> subsequent NOIO allocations. I have no idea why that happens though. >> >> To test that theory you can try to change GFP_IOFS to GFP_KERNEL in the >> calls to clear_gfp_allowed_mask() in kernel/power/hibernate.c (and in >> kernel/power/suspend.c for completness). >> > > If allocation of kernel threads for stop_machine_run() is the problem, > > What happens when > 1. use CONIFG_4KSTACK Interesting question. 4KSTACK doesn't stop it though; it hangs in the same place. > or > 2. make use of stop_machine_create(), stop_machine_destroy(). > A new interface added by this commit. > http://git.kernel.org/?p=linux/kernel/git/torvalds/ > linux-2.6.git;a=commit;h=9ea09af3bd3090e8349ca2899ca2011bd94cda85 > You can do no-fail stop_machine_run(). > > Thanks, > -Kame Since this is a uni-processor machine that would make it a single 4K allocation. AIUI this is supposed to be ok. The hibernation code tries to make sure there is over 1000x that much free RAM (ish), in anticipation of this sort of requirement. There appear to be some deficiencies in the way this allowance works, which have recently been exposed. And unfortunately the allocation hangs instead of failing, so we're in unclean shutdown territory. I have three test scenarios at the moment. I've tested two patches which appear to fix the common cases, but there's still a third test scenario to figure out. (Repeated hibernation attempts with insufficient swap - encountered during real-world use, believe it or not). Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on 24 Feb 2010 15:40 On Wednesday 24 February 2010, KAMEZAWA Hiroyuki wrote: > On Tue, 23 Feb 2010 22:13:56 +0100 > "Rafael J. Wysocki" <rjw(a)sisk.pl> wrote: > > > Well, it still looks like we're waiting for create_workqueue_thread() to > > return, which probably is trying to allocate memory for the thread > > structure. > > > > My guess is that the preallocated memory pages freed by > > free_unnecessary_pages() go into a place from where they cannot be taken for > > subsequent NOIO allocations. I have no idea why that happens though. > > > > To test that theory you can try to change GFP_IOFS to GFP_KERNEL in the > > calls to clear_gfp_allowed_mask() in kernel/power/hibernate.c (and in > > kernel/power/suspend.c for completness). > > > > If allocation of kernel threads for stop_machine_run() is the problem, > > What happens when > 1. use CONIFG_4KSTACK > or > 2. make use of stop_machine_create(), stop_machine_destroy(). > A new interface added by this commit. > http://git.kernel.org/?p=linux/kernel/git/torvalds/ linux-2.6.git;a=commit;h=9ea09af3bd3090e8349ca2899ca2011bd94cda85 > You can do no-fail stop_machine_run(). Well, that would probably help in this particular case, but the root cause seems to be that the (theoretically) freed memory cannot be used for NOIO allocations for some reason, which is shown by the Alan's testing. Generally speaking, we use __free_page() to release some pages preallocated for the hibernation image, but the memory subsystem refuses to use these pages for NOIO allocations made later. However, it evidently is able to use them is __GFP_WAIT is unset in the mask. Is this behavior intentional? Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on 24 Feb 2010 16:00 On Wednesday 24 February 2010, Alan Jenkins wrote: > On 2/23/10, Rafael J. Wysocki <rjw(a)sisk.pl> wrote: .... > > My guess is that the preallocated memory pages freed by > > free_unnecessary_pages() go into a place from where they cannot be taken for > > subsequent NOIO allocations. I have no idea why that happens though. > > > > To test that theory you can try to change GFP_IOFS to GFP_KERNEL in the > > calls to clear_gfp_allowed_mask() in kernel/power/hibernate.c (and in > > kernel/power/suspend.c for completness). > > Effectively forcing GFP_NOWAIT, so the allocation should fail instead > of hanging? > > It seems to stop the hang, but I don't see any other difference - the > hibernation process isn't stopped earlier, and I don't get any new > kernel messages about allocation failures. I wonder if it's because > GFP_NOWAIT triggers ALLOC_HARDER. > > I have other evidence which argues for your theory: > > [ successful s2disk, with forced NOIO (but not NOWAIT), and test code > as attached ] > > Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done. > 1280 GFP_NOWAIT allocations of order 0 are possible > 640 GFP_NOWAIT allocations of order 1 are possible > 320 GFP_NOWAIT allocations of order 2 are possible > > [ note - 1280 pages is the maximum test allocation used here. The > test code is only accurate when talking about smaller numbers of free > pages ] > > 1280 GFP_KERNEL allocations of order 0 are possible > 640 GFP_KERNEL allocations of order 1 are possible > 320 GFP_KERNEL allocations of order 2 are possible > > PM: Preallocating image memory... > 212 GFP_NOWAIT allocations of order 0 are possible > 102 GFP_NOWAIT allocations of order 1 are possible > 50 GFP_NOWAIT allocations of order 2 are possible > > Freeing all 90083 preallocated pages > (and 0 highmem pages, out of 0) > 190 GFP_NOWAIT allocations of order 0 are possible > 102 GFP_NOWAIT allocations of order 1 are possible > 50 GFP_NOWAIT allocations of order 2 are possible > 1280 GFP_KERNEL allocations of order 0 are possible > 640 GFP_KERNEL allocations of order 1 are possible > 320 GFP_KERNEL allocations of order 2 are possible > done (allocated 90083 pages) > > It looks like you're right and the freed pages are not accessible with > GFP_NOWAIT for some reason. I'd expect this, really. There only is a limited number of pages you can allocate with GFP_NOWAIT. > I also tried a number of test runs with too many applications, and saw this: > > Freeing all 104006 preallocated pages ... > 65 GFP_NOWAIT allocations of order 0 ... > 18 GFP_NOWAIT allocations of order 1 ... > 9 GFP_NOWAIT allocations of order 2 ... > 0 GFP_KERNEL allocations of order 0 are possible > ... Now that's interesting. We've just freed 104006 pages and we can't allocate any, so where did all of these freed pages go, actually? OK, I think I see what the problem is. Quite embarassing, actually ... Can you check if the patch below helps? Rafael --- kernel/power/snapshot.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6/kernel/power/snapshot.c =================================================================== --- linux-2.6.orig/kernel/power/snapshot.c +++ linux-2.6/kernel/power/snapshot.c @@ -1181,7 +1181,7 @@ static void free_unnecessary_pages(void) memory_bm_position_reset(©_bm); - while (to_free_normal > 0 && to_free_highmem > 0) { + while (to_free_normal > 0 || to_free_highmem > 0) { unsigned long pfn = memory_bm_next_pfn(©_bm); struct page *page = pfn_to_page(pfn); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Alan Jenkins on 25 Feb 2010 08:20 On 2/24/10, Rafael J. Wysocki <rjw(a)sisk.pl> wrote: > On Wednesday 24 February 2010, Alan Jenkins wrote: >> On 2/23/10, Rafael J. Wysocki <rjw(a)sisk.pl> wrote: > ... >> > My guess is that the preallocated memory pages freed by >> > free_unnecessary_pages() go into a place from where they cannot be taken >> > for >> > subsequent NOIO allocations. I have no idea why that happens though. >> > >> > To test that theory you can try to change GFP_IOFS to GFP_KERNEL in the >> > calls to clear_gfp_allowed_mask() in kernel/power/hibernate.c (and in >> > kernel/power/suspend.c for completness). >> >> Effectively forcing GFP_NOWAIT, so the allocation should fail instead >> of hanging? >> >> It seems to stop the hang, but I don't see any other difference - the >> hibernation process isn't stopped earlier, and I don't get any new >> kernel messages about allocation failures. I wonder if it's because >> GFP_NOWAIT triggers ALLOC_HARDER. >> >> I have other evidence which argues for your theory: >> >> [ successful s2disk, with forced NOIO (but not NOWAIT), and test code >> as attached ] >> >> Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done. >> 1280 GFP_NOWAIT allocations of order 0 are possible >> 640 GFP_NOWAIT allocations of order 1 are possible >> 320 GFP_NOWAIT allocations of order 2 are possible >> >> [ note - 1280 pages is the maximum test allocation used here. The >> test code is only accurate when talking about smaller numbers of free >> pages ] >> >> 1280 GFP_KERNEL allocations of order 0 are possible >> 640 GFP_KERNEL allocations of order 1 are possible >> 320 GFP_KERNEL allocations of order 2 are possible >> >> PM: Preallocating image memory... >> 212 GFP_NOWAIT allocations of order 0 are possible >> 102 GFP_NOWAIT allocations of order 1 are possible >> 50 GFP_NOWAIT allocations of order 2 are possible >> >> Freeing all 90083 preallocated pages >> (and 0 highmem pages, out of 0) >> 190 GFP_NOWAIT allocations of order 0 are possible >> 102 GFP_NOWAIT allocations of order 1 are possible >> 50 GFP_NOWAIT allocations of order 2 are possible >> 1280 GFP_KERNEL allocations of order 0 are possible >> 640 GFP_KERNEL allocations of order 1 are possible >> 320 GFP_KERNEL allocations of order 2 are possible >> done (allocated 90083 pages) >> >> It looks like you're right and the freed pages are not accessible with >> GFP_NOWAIT for some reason. > > I'd expect this, really. There only is a limited number of pages you can > allocate with GFP_NOWAIT. > >> I also tried a number of test runs with too many applications, and saw >> this: >> >> Freeing all 104006 preallocated pages ... >> 65 GFP_NOWAIT allocations of order 0 ... >> 18 GFP_NOWAIT allocations of order 1 ... >> 9 GFP_NOWAIT allocations of order 2 ... >> 0 GFP_KERNEL allocations of order 0 are possible >> ... > > Now that's interesting. We've just freed 104006 pages and we can't allocate > any, so where did all of these freed pages go, actually? > > OK, I think I see what the problem is. Quite embarassing, actually ... > > Can you check if the patch below helps? > > Rafael > - while (to_free_normal > 0 && to_free_highmem > 0) { > + while (to_free_normal > 0 || to_free_highmem > 0) { Yes, that seems to do it. No more hangs so far (and I can still reproduce the hang with too many applications if I un-apply the patch). I did see a non-fatal allocation failure though, so I'm still not sure that the current implementation is strictly correct. This is without the patch to increase "to_free_normal". If I get the allocation failure again, should I try testing the "free 20% extra" patch? Many thanks Alan Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done. PM: Preallocating image memory... events/0: page allocation failure. order:0, mode:0xd0 Pid: 6, comm: events/0 Not tainted 2.6.33-rc8eeepc-00165-gecdaf98-dirty #101 Call Trace: ? printk+0xf/0x18 __alloc_pages_nodemask+0x46a/0x4dd cache_alloc_refill+0x250/0x42d kmem_cache_alloc+0x70/0xee acpi_ps_alloc_op+0x4a/0x84 acpi_ps_create_scope_op+0xd/0x1c acpi_ps_execute_method+0xed/0x29c acpi_ns_evaluate+0x13b/0x241 acpi_evaluate_object+0x11c/0x243 ? trace_hardirqs_on_caller+0x100/0x121 acpi_evaluate_integer+0x30/0x9c acpi_thermal_get_temperature+0x2d/0x6a [thermal] thermal_get_temp+0x1d/0x33 [thermal] thermal_zone_device_update+0x29/0x1ce [thermal_sys] ? worker_thread+0x14b/0x250 thermal_zone_device_check+0xd/0xf [thermal_sys] worker_thread+0x18d/0x250 ? worker_thread+0x14b/0x250 ? thermal_zone_device_check+0x0/0xf [thermal_sys] ? autoremove_wake_function+0x0/0x2f ? worker_thread+0x0/0x250 kthread+0x6a/0x6f ? kthread+0x0/0x6f kernel_thread_helper+0x6/0x1a Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 192 active_anon:4987 inactive_anon:5012 isolated_anon:0 active_file:34 inactive_file:86 isolated_file:0 unevictable:1042 dirty:0 writeback:0 unstable:0 free:1188 slab_reclaimable:1510 slab_unreclaimable:2338 mapped:1975 shmem:7985 pagetables:894 bounce:0 DMA free:2016kB min:88kB low:108kB high:132kB active_anon:0kB inactive_anon:12kB active_file:8kB inactive_file:0kB unevictable:40kB isolated(anon):0kB isolated(file):0kB present:15804kB mlocked:40kB dirty:0kB writeback:0kB mapped:48kB shmem:0kB slab_reclaimable:8kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:8kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:3 all_unreclaimable? no lowmem_reserve[]: 0 483 483 483 Normal free:2736kB min:2764kB low:3452kB high:4144kB active_anon:19948kB inactive_anon:20036kB active_file:128kB inactive_file:344kB unevictable:4128kB isolated(anon):0kB isolated(file):0kB present:495300kB mlocked:4128kB dirty:0kB writeback:0kB mapped:7852kB shmem:31940kB slab_reclaimable:6032kB slab_unreclaimable:9348kB kernel_stack:920kB pagetables:3568kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:20672 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB Normal: 684*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2736kB 9036 total pagecache pages 0 pages in swap cache Swap cache stats: add 132152, delete 132152, find 22472/27086 Free swap = 63644kB Total swap = 358392kB 128880 pages RAM 0 pages HighMem 3743 pages reserved 20404 pages shared 116286 pages non-shared hda-intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj. Thermal: failed to read out thermal zone 0 done (allocated 105868 pages) PM: Allocated 423472 kbytes in 25.48 seconds (16.61 MB/s) atl2 0000:03:00.0: PCI INT A disabled ata_piix 0000:00:1f.2: PCI INT B disabled HDA Intel 0000:00:1b.0: PCI INT A disabled PM: freeze of devices complete after 163.921 msecs PM: late freeze of devices complete after 2.951 msecs Disabling non-boot CPUs ... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on 25 Feb 2010 15:10
On Thursday 25 February 2010, Alan Jenkins wrote: > On 2/24/10, Rafael J. Wysocki <rjw(a)sisk.pl> wrote: > > On Wednesday 24 February 2010, Alan Jenkins wrote: .... > > > - while (to_free_normal > 0 && to_free_highmem > 0) { > > + while (to_free_normal > 0 || to_free_highmem > 0) { > > Yes, that seems to do it. No more hangs so far (and I can still > reproduce the hang with too many applications if I un-apply the > patch). OK, great. Is this with or without the NOIO-enforcing patch? > I did see a non-fatal allocation failure though, so I'm still not sure > that the current implementation is strictly correct. > > This is without the patch to increase "to_free_normal". If I get the > allocation failure again, should I try testing the "free 20% extra" > patch? Either that or try to increase SPARE_PAGES. That should actually work with the last patch applied. :-) Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |