Prev: linux-next: build failure after merge of the final tree (cifs tree related)
Next: [PATCH 3/3] powerpc-silence-xics_migrate_irqs_away-during-cpu-offline
From: Nigel Cunningham on 7 Aug 2010 18:50 Hi. On 07/08/10 08:07, Hugh Dickins wrote: > On Thu, Aug 5, 2010 at 9:40 PM, Nigel Cunningham<nigel(a)tuxonice.net> wrote: >> On 06/08/10 11:15, Hugh Dickins wrote: >>> On Wed, Aug 4, 2010 at 11:28 PM, Nigel Cunningham<nigel(a)tuxonice.net> >>> wrote: >>>>>> On 05/08/10 13:58, Hugh Dickins wrote: >>>>> >>>>> I agree it would make more sense to discard swap when freeing rather >>>>> than when allocating, I wish we could. But at the freeing point we're >>>>> often holding a page_table spinlock at an outer level, and it's just >>>>> one page we're given to free. Freeing is an operation you want to be >>>>> comfortable doing when you're short of resources, whereas discard is a >>>>> kind of I/O operation which needs resources. >> >> The more I think about this, the more it seems to me that doing the discard >> at allocation time is just wrong. How about a strategy in which you do the >> discard immediately if resources permit, or flag it as in need of doing (at >> a future swap free of that page or swap off time) if things are too >> pressured at the moment? I haven't put thought into what data structures >> could be used for that - just want to ask for now if you'd be happy with the >> idea of looking into a strategy like that. > > We're going round in circles here. I have already agreed with you > that doing it at swap free time seems more natural, but explained that > there are implementation difficulties there, so doing it at allocation > time proved both much less messy and more efficient. I can imagine > advances that would sway me to putting in more effort there > (duplicating the scan for a free 1MB every time a page of swap is > freed? doesn't sound efficient to me, but if it saves us from an > inefficiency of too many or too late discards, perhaps it could work > out). However, a recently introduced regression does not make a > strong case for adding in hacks - not yet anyway. Okay; sorry. >> I've done the bisect now - spent the time today instead of on the place, and > > That's great, many thanks! > >> it took me to fbbf055692aeb25c54c49d9ca84532de836fbba0. This is Christoph's >> Hellwig's patch "block: fix DISCARD_BARRIER requests". > > That's > block: fix DISCARD_BARRIER requests > > Filesystems assume that DISCARD_BARRIER are full barriers, so that they > don't have to track in-progress discard operation when submitting new I/O. > But currently we only treat them as elevator barriers, which don't > actually do the nessecary queue drains. > > where the discard barrier changes from REQ_SOFTBARRIER to REQ_HARDBARRIER. > > If REQ_SOFTBARRIER means that the device is still free to reorder a > write, which was issued after discard completion was reported, before > the discard (so later discarding the data written), then certainly I > agree with Christoph (now Cc'ed) that the REQ_HARDBARRIER is > unavoidable there; but if not, then it's not needed for the swap case. > I hope to gain a little more enlightenment on such barriers shortly. > > What does seem over the top to me, is for mm/swapfile.c's > blkdev_issue_discard()s to be asking for both BLKDEV_IFL_WAIT and > BLKDEV_IFL_BARRIER: those swap discards were originally written just > to use barriers, without needing to wait for completion in there. I'd > be interested to hear if cutting out the BLKDEV_IFL_WAITs makes the > swap discards behave acceptably again for you - but understand that > you won't have a chance to try that until later next week. Well, I've just arrived at the hotel, and I want to delay going to bed until a 'proper' hour local time, so I expect I'll do it this evening - especially a one liner like that. Might even give it a go now... Nigel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |