Prev: [PATCH] lockup_detector: Make DETECT_HUNT_TASK default depend on LOCKUP_DETECTOR
Next: [PATCH] wm8994-core: fix wm8994_device_init() return value
From: Nigel Cunningham on 3 Aug 2010 21:50 Hi all. I've just given hibernation a go under 2.6.35, and at first I thought there was some sort of hang in freezing processes. The computer sat there for aaaaaages, apparently doing nothing. Switched from TuxOnIce to swsusp to see if it was specific to my code but no - the problem was there too. I used the nifty new kdb support to get a backtrace, which was: get_swap_page_of_type discard_swap_cluster blk_dev_issue_discard wait_for_completion Adding a printk in discard swap cluster gives the following: [ 46.758330] Discarding 256 pages from bdev 800003 beginning at page 640377. [ 47.003363] Discarding 256 pages from bdev 800003 beginning at page 640633. [ 47.246514] Discarding 256 pages from bdev 800003 beginning at page 640889. .... [ 221.877465] Discarding 256 pages from bdev 800003 beginning at page 826745. [ 222.121284] Discarding 256 pages from bdev 800003 beginning at page 827001. [ 222.365908] Discarding 256 pages from bdev 800003 beginning at page 827257. [ 222.610311] Discarding 256 pages from bdev 800003 beginning at page 827513. So allocating 4GB of swap on my SSD now takes 176 seconds instead of virtually no time at all. (This code is completely unchanged from 2.6.34). I have a couple of questions: 1) As far as I can see, there haven't been any changes in mm/swapfile.c that would cause this slowdown, so something in the block layer has (from my point of view) regressed. Is this a known issue? 2) Why are we calling discard_swap_cluster anyway? The swap was unused and we're allocating it. I could understand calling it when freeing swap, but when allocating? Regards, Nigel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Martin K. Petersen on 4 Aug 2010 14:10 >>>>> "Mark" == Mark Lord <kernel(a)teksavvy.com> writes: Mark> Looks to me like more and more things are using the block discard Mark> functionality, and as predicted it is slowing things down Mark> enormously. Mark> The problem is that we still only discard tiny bits (a single Mark> range still??) per TRIM command, rather than batching larger Mark> ranges and larger numbers of ranges into single TRIM commands. Mark> That's a very poor implementation, especially when things start Mark> enabling it by default. Eg. the swap code, mke2fs, etc.. I'm working on aggregation. But it's harder than we initially thought... -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Hugh Dickins on 13 Aug 2010 14:20
On Fri, Aug 13, 2010 at 4:54 AM, Christoph Hellwig <hch(a)infradead.org> wrote: > On Fri, Aug 06, 2010 at 03:07:25PM -0700, Hugh Dickins wrote: >> If REQ_SOFTBARRIER means that the device is still free to reorder a >> write, which was issued after discard completion was reported, before >> the discard (so later discarding the data written), then certainly I >> agree with Christoph (now Cc'ed) that the REQ_HARDBARRIER is >> unavoidable there; but if not, then it's not needed for the swap case. >> I hope to gain a little more enlightenment on such barriers shortly. > > REQ_SOFTBARRIER is indeed purely a reordering barrier inside the block > elevator. > >> What does seem over the top to me, is for mm/swapfile.c's >> blkdev_issue_discard()s to be asking for both BLKDEV_IFL_WAIT and >> BLKDEV_IFL_BARRIER: those swap discards were originally written just >> to use barriers, without needing to wait for completion in there. I'd >> be interested to hear if cutting out the BLKDEV_IFL_WAITs makes the >> swap discards behave acceptably again for you - but understand that >> you won't have a chance to try that until later next week. > > That does indeed look incorrect to me. Any kind of explicit waits > usually mean the caller provides ordering. Getting rid of > BLKDEV_IFL_BARRIER in the swap code ASAP would indeed be beneficial > given that we are trying to get rid of hard barriers completely soon. > Auditing the existing blkdev_issue_discard callers in filesystems > is high on the todo list for me. Yes. Above I was suggesting for Nigel to experiment with cutting out swap discard's BLKDEV_IFL_WAITs - and the results of cutting those out but leaving its BLKDEV_IFL_BARRIERs would still be interesting. But after digesting the LSF discussion and the email thread that led up to it, I came to the same conclusion as you, that going forward we want to keep its BLKDEV_IFL_WAITs (swapfile.c already provides all the other synchronization for that to fit into - things like never freeing swap while its still under writeback) and simply remove its BLKDEV_IFL_BARRIERs. However, I am still not quite sure that we can already make that change for 2.6.35 (-stable). Can you reassure me on the question I raise above: if we issue a discard to a device with cache, wait for "completion", then issue a write into the area spanned by that discard, can we be certain that the write to backing store will not be reordered before the discard of backing store (unless the device is just broken)? Without a REQ_HARDBARRIER in the 2.6.35 scheme? It seems a very reasonable assumption to me, but I'm learning not to depend upon reasonable assumptions here. (By the way, it doesn't matter at all whether writes not spanned by the discard pass it or not.) Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |