From: Nick Piggin on 19 May 2010 12:00 On Wed, May 19, 2010 at 11:45:42AM -0400, Steven Rostedt wrote: > On Wed, 2010-05-19 at 17:33 +0200, Miklos Szeredi wrote: > > On Wed, 19 May 2010, Linus Torvalds wrote: > > > Btw, since you apparently have a real case - is the "splice to file" > > > always just an append? IOW, if I'm not right in assuming that the only > > > sane thing people would reasonable care about is "append to a file", then > > > holler now. > > > > Virtual machines might reasonably need this for splicing to a disk > > image. > > This comes down to balancing speed and complexity. Perhaps a copy is > fine in this case. > > I'm concerned about high speed tracing, where we are always just taking > pages from the trace ring buffer and appending them to a file or sending > them off to the network. The slower this is, the more likely you will > lose events. > > If the "move only on append to file" is easy to implement, I would > really like to see that happen. The speed of splicing a disk image for a > virtual machine only impacts the patience of the user. The speed of > splicing tracing output, impacts how much you can trace without losing > events. It's not "easy" to implement :) What's your ring buffer look like? Is it a normal user address which the kernel does copy_to_user()ish things into? Or a mmapped special driver? If the latter, it get's even harder again. But either way if the source pages just have to be regenerated anyway (eg. via page fault on next access), then it might not even be worthwhile to do the splice move. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Mathieu Desnoyers on 19 May 2010 12:00 * Steven Rostedt (rostedt(a)goodmis.org) wrote: > On Wed, 2010-05-19 at 17:33 +0200, Miklos Szeredi wrote: > > On Wed, 19 May 2010, Linus Torvalds wrote: > > > Btw, since you apparently have a real case - is the "splice to file" > > > always just an append? IOW, if I'm not right in assuming that the only > > > sane thing people would reasonable care about is "append to a file", then > > > holler now. > > > > Virtual machines might reasonably need this for splicing to a disk > > image. > > This comes down to balancing speed and complexity. Perhaps a copy is > fine in this case. > > I'm concerned about high speed tracing, where we are always just taking > pages from the trace ring buffer and appending them to a file or sending > them off to the network. The slower this is, the more likely you will > lose events. > > If the "move only on append to file" is easy to implement, I would > really like to see that happen. The speed of splicing a disk image for a > virtual machine only impacts the patience of the user. The speed of > splicing tracing output, impacts how much you can trace without losing > events. I'm with Steven here. I only care about appending full pages at the end of a file. If possible, I'd also like to steal back the pages after waiting for the writeback I/O to complete so we can put them back in the ring buffer without stressing the page cache and the page allocator needlessly. Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Mathieu Desnoyers on 19 May 2010 12:00 * Steven Rostedt (rostedt(a)goodmis.org) wrote: > On Wed, 2010-05-19 at 07:59 -0700, Linus Torvalds wrote: > > > > > Btw, since you apparently have a real case - is the "splice to file" > > always just an append? IOW, if I'm not right in assuming that the only > > sane thing people would reasonable care about is "append to a file", then > > holler now. > > My use case is just to move the data from the ring buffer into a file > (or network) as fast as possible. It creates a new file and all > additions are "append to a file". > > I believe Mathieu does the same. > > With me, you are correct. Same here. My ring buffer only ever use splice() to append at the end of a file or to the network, and always outputs data in multiples of the page size. Thanks, Mathieu > > -- Steve > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Miklos Szeredi on 19 May 2010 12:00 On Wed, 19 May 2010, Linus Torvalds wrote: > On Wed, 19 May 2010, Miklos Szeredi wrote: > > > > Another limitation I found while splicing from one file to another is > > that stealing from the source file's page cache does not always > > succeed. This turned out to be because of a reference from the lru > > cache for freshly read pages. I'm not sure how this could be fixed. > > It should be fixed by saying "you can't always just move the page". > > Copying is not evil. Complexity to avoid copies is evil. And predictability is good. The thing I don't like about the above is that it makes it totally unpredictable which pages will get moved, if any. Another related thing: if splicing from a file knowing that it will need to be stolen, then it makes zero sense to first insert the pages into the page cache then remove them shortly to be inserted into another file's cache. So we could have a flag saying "don't cache newly read pages, just put them in the pipe buffer", which would solve the above problem as well as speeding up the operation. Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Mathieu Desnoyers on 19 May 2010 12:10
* Nick Piggin (npiggin(a)suse.de) wrote: > On Wed, May 19, 2010 at 11:45:42AM -0400, Steven Rostedt wrote: > > On Wed, 2010-05-19 at 17:33 +0200, Miklos Szeredi wrote: > > > On Wed, 19 May 2010, Linus Torvalds wrote: > > > > Btw, since you apparently have a real case - is the "splice to file" > > > > always just an append? IOW, if I'm not right in assuming that the only > > > > sane thing people would reasonable care about is "append to a file", then > > > > holler now. > > > > > > Virtual machines might reasonably need this for splicing to a disk > > > image. > > > > This comes down to balancing speed and complexity. Perhaps a copy is > > fine in this case. > > > > I'm concerned about high speed tracing, where we are always just taking > > pages from the trace ring buffer and appending them to a file or sending > > them off to the network. The slower this is, the more likely you will > > lose events. > > > > If the "move only on append to file" is easy to implement, I would > > really like to see that happen. The speed of splicing a disk image for a > > virtual machine only impacts the patience of the user. The speed of > > splicing tracing output, impacts how much you can trace without losing > > events. > > It's not "easy" to implement :) What's your ring buffer look like? > Is it a normal user address which the kernel does copy_to_user()ish > things into? Or a mmapped special driver? > > If the latter, it get's even harder again. But either way if the > source pages just have to be regenerated anyway (eg. via page fault > on next access), then it might not even be worthwhile to do the > splice move. Steven and I use pages to which we write directly by using the page address from the linear memory mapping returned by page_address(). These pages have no other mapping. They are moved to the pipe, and then from the pipe to a file (or to the network). It's possibly the simplest scenario you could think of for splice(). Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |