From: Steven Rostedt on 18 May 2010 12:20 On Tue, 2010-05-18 at 11:53 -0400, Steven Rostedt wrote: > I'm currently looking at the network code to see if it is better. The network code seems to do the right thing. It sends the actual page directly to the network. Hopefully we can find a way to avoid the copy to file. But the splice code was created to avoid the copy to and from userspace, it did not guarantee no copy within the kernel itself. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on 18 May 2010 12:20 On Wed, May 19, 2010 at 02:00:51AM +1000, Nick Piggin wrote: > On Tue, May 18, 2010 at 10:56:24AM -0500, Christoph Lameter wrote: > > On Wed, 19 May 2010, Nick Piggin wrote: > > > > > What would be needed is to have filesystem maintainers go through and > > > enable it on a case by case basis. It's trivial for tmpfs/ramfs type > > > filesystems and I have a patch for those, but I never posted it on.yet. > > > Even basic buffer head filesystems IIRC get a little more complex -- > > > but we may get some milage just out of invalidating the existing > > > pagecache rather than getting fancy and trying to move buffers over > > > to the new page. > > > > There is a "migration" address space operation for moving pages. Page > > migration requires that in order to be able to move dirty pages. Can > > splice use that? > > Hmm yes I didn't think of that, it probably could. It's not the only requirement, of course, just that it could potentially reuse some of the code. The big difference is that the source page is already dirty, and the destination page might not exist, might exist and be partially uptodate, not have blocks allocated, might be past i_size, fully uptodate, etc. So it's more than a matter of just a simple copy to another page and taking over exactly the same filesystem state as the old page. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 18 May 2010 12:30 On Tue, 18 May 2010, Steven Rostedt wrote: > > Hopefully we can find a way to avoid the copy to file. But the splice > code was created to avoid the copy to and from userspace, it did not > guarantee no copy within the kernel itself. Well, we always _wanted_ to splice directly to a file, but it's just not been done properly. It's not entirely trivial, since you need to worry about preexisting pages and generally just do the right thing wrt the filesystem. And no, it should NOT use migration code. I suspect you could do something fairly simple like: - get the inode semaphore. - check if the splice is a pure "extend size" operation for that page - if so, just create the page cache entry and mark it dirty - otherwise, fall back to copying. because the "extend file" case is the easiest one, and is likely the only one that matters in practice (if you are overwriting an existing file, things get _way_ hairier, and why the hell would anybody expect that to be fast anyway?) But somebody needs to write the code.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on 19 May 2010 02:40 On Tue, May 18, 2010 at 09:25:05AM -0700, Linus Torvalds wrote: > > > On Tue, 18 May 2010, Steven Rostedt wrote: > > > > Hopefully we can find a way to avoid the copy to file. But the splice > > code was created to avoid the copy to and from userspace, it did not > > guarantee no copy within the kernel itself. > > Well, we always _wanted_ to splice directly to a file, but it's just not > been done properly. It's not entirely trivial, since you need to worry > about preexisting pages and generally just do the right thing wrt the > filesystem. > > And no, it should NOT use migration code. I suspect you could do something > fairly simple like: I was thinking it could possibly reuse some of the migration code for swapping filesystem state to the new page. But I agree it gets hairy and is probably better to just insert new pages. > > - get the inode semaphore. > - check if the splice is a pure "extend size" operation for that page > - if so, just create the page cache entry and mark it dirty > - otherwise, fall back to copying. > > because the "extend file" case is the easiest one, and is likely the only > one that matters in practice (if you are overwriting an existing file, > things get _way_ hairier, and why the hell would anybody expect that to be > fast anyway?) > > But somebody needs to write the code.. We can possibly do an attempt to invalidate existing pagecache and then try to install the new page. The filesystem still needs a look over to ensure error handling will work properly, and that it does not make incorrect assumptions about the contents of the page being passed in. This still isn't ideal because we drop the filesystem state (eg bufer heads) on a page which, by definition, will need to be written out soon. But something smarter could be added if it turns out to be important. Big if, because I don't like adding complex code without having a really good reason. I do like having the splice flag there, though. The more the app can tell the kernel the better. Hopefully people use it and we can get a better idea of whether these fancy optimisations will be worth it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 19 May 2010 10:50
On Wed, 19 May 2010, Nick Piggin wrote: > > We can possibly do an attempt to invalidate existing pagecache and > then try to install the new page. Yes, but that's going to be rather hairier. We need to make sure that the filesystem doesn't have some kind of dirty pointers to the old page etc. Although I guess that should always show up in the page counters, so I guess we can always handle the case of page_count() being 1 (only page cache) and the page being unlocked. So I'd much rather just handle the "append to the end". The real limitation is likely always going to be the fact that it has to be page-aligned and a full page. For a lot of splice inputs, that simply won't be the case, and you'll end up copying for alignment reasons anyway. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |