Prev: [PATCH 6/6] xen/hybrid: Enable grant table and xenbus
Next: mips: use generic ptrace_resume code
From: Linus Torvalds on 2 Feb 2010 12:50 On Tue, 2 Feb 2010, Wu Fengguang wrote: > > Some applications (eg. blkid, id3tool etc.) seek around the file > to get information. For example, blkid does > seek to 0 > read 1024 > seek to 1536 > read 16384 > > The start-of-file readahead heuristic is wrong for them, whose > access pattern can be identified by lseek() calls. > > So test-and-set a READAHEAD_LSEEK flag on lseek() and don't > do start-of-file readahead on seeing it. Proposed by Linus. > > CC: Linus Torvalds <torvalds(a)linux-foundation.org> > Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com> Acked-by: Linus Torvalds <torvalds(a)linux-foundation.org> Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 2 Feb 2010 13:50 On Tue, 2 Feb 2010, Olivier Galibert wrote: > > Wouldn't that trigger on lseeks to end of file to get the size? Well, you'd only ever do that with a raw block device, no (if even that: more "raw block device" tools just use the BLKSIZE64 ioctl etc)? Any sane regular file accessor will do 'fstat()' instead. And do we care about startup speed of ramping up read-ahead from the beginning? In fact, the problem case that caused this was literally 'blkid' on a block device - and the fact that the kernel tried to read-ahead TOO MUCh rather than too little. If somebody is really doing lots of serial reading, the read-ahead code will figure it out very quickly. The case this worries about is just the _first_ read, where the question is one of "do we think it might be seeking around, or does it look like the user is going to just read the whole thing"? IOW, if you start off with a SEEK_END, I think it's reasonable to expect it to _not_ read the whole thing. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 2 Feb 2010 14:20 On Tue, 2 Feb 2010, Olivier Galibert wrote: > > On Tue, Feb 02, 2010 at 10:40:41AM -0800, Linus Torvalds wrote: > > IOW, if you start off with a SEEK_END, I think it's reasonable to expect > > it to _not_ read the whole thing. > > I've seen a lot of: > int fd = open(...); > size = lseek(fd, 0, SEEK_END); > lseek(fd, 0, SEEK_SET); > > data = malloc(size); > read(fd, data, size); > close(fd); > > Why not fstat? I don't know. Well, the above will work perfectly with or without the patch, since it does the read of the full size. There is no read-ahead hint necessary for that kind of single read behavior. Rememebr: read-ahead is about filling the empty IO spaces _between_ reads, and turning many smaller reads into one bigger one. If you only have a single big read, read-ahead cannot help. Also, keep in mind that read-ahead is not always a win. It can be a huge loss too. Which is why we have _heuristics_. They fundamentally cannot catch every case, but what they aim for is to do a good job on average. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 2 Feb 2010 15:30 On Tue, 2 Feb 2010, david(a)lang.hm wrote: > On Tue, 2 Feb 2010, Linus Torvalds wrote: > > > > Also, keep in mind that read-ahead is not always a win. It can be a huge > > loss too. Which is why we have _heuristics_. They fundamentally cannot > > catch every case, but what they aim for is to do a good job on average. > > as a note from the field, I just had an application that needed to be changed > because it did excessive read-ahead. it turned a 2 min reporting run into a 20 > min reporting run because for this report the access was really random and the > app forced large read-ahead. Yeah. And the reason Wu did this patch is similar: something that _should_ have taken just quarter of a second took about 7 seconds, because read-ahead triggered on this really slow device that only feeds about 15kB/s (yes, _kilo_byte, not megabyte). You can always use POSIX_FADVISE_RANDOM to disable it, but it's seldom something that people do. And there are real loads that have random components to them without being _entirely_ random, so in an optimal world we should just have heuristics that work well. The problem is, it's often easier to test/debug the "good" cases, ie the cases where we _want_ read-ahead to trigger. So that probably means that we have a tendency to read-ahead too aggressively, because those cases are the ones where people can most easily look at it and say "yeah, this improves throughput of a 'dd bs=8192'". So then when we find loads where read-ahead hurts, I think we need to take _that_ case very seriously. Because otherwise our selection bias for testing read-ahead will fail. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: david on 2 Feb 2010 15:40 On Tue, 2 Feb 2010, Linus Torvalds wrote: > Rememebr: read-ahead is about filling the empty IO spaces _between_ reads, > and turning many smaller reads into one bigger one. If you only have a > single big read, read-ahead cannot help. > > Also, keep in mind that read-ahead is not always a win. It can be a huge > loss too. Which is why we have _heuristics_. They fundamentally cannot > catch every case, but what they aim for is to do a good job on average. as a note from the field, I just had an application that needed to be changed because it did excessive read-ahead. it turned a 2 min reporting run into a 20 min reporting run because for this report the access was really random and the app forced large read-ahead. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Next
|
Last
Pages: 1 2 Prev: [PATCH 6/6] xen/hybrid: Enable grant table and xenbus Next: mips: use generic ptrace_resume code |