From: Ted Ts'o on
P.S. If it wasn't clear, I'm still in favor of trying to coordinate
barriers across the whole file system, since that is much more likely
to help use cases that arise in real life.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ric Wheeler on
On 08/06/2010 03:04 AM, Darrick J. Wong wrote:
> On Thu, Aug 05, 2010 at 12:45:04PM -0400, Ted Ts'o wrote:
>> P.S. If it wasn't clear, I'm still in favor of trying to coordinate
>> barriers across the whole file system, since that is much more likely
>> to help use cases that arise in real life.
> Ok. I have a rough sketch of a patch to do that, and I was going to send it
> out today, but the test machine caught on fire while I was hammering it with
> the fsync tests one last time and ... yeah. I'm fairly sure the patch didn't
> cause the fire, but I'll check anyway after I finish cleaning up.
>
> "[PATCH] ext4: Don't set my machine ablaze with barrier requests" :P
>
> (The patch did seem to cut barrier requests counts by about 20% though the
> impact on performance was pretty small.)
>
> --D

Just a note, one thing that we have been doing is trying to get a reasonable
regression test in place for testing data integrity. That might be useful to
share as we float patches around barrier changes.

Basic test:

(1) Get a box with an external e-sata (or USB) connected drive

(2) Fire off some large load on that drive (Chris Mason had one, some of our QE
engineers have been using fs_mark (fs_mark -d /your_fs/test_dir -S 0 -t 8 -F)

(3) Pull the power cable to that external box.

Of course, you can use any system and drop power, but the above setup will make
sure that we kill the write cache on the device without letting the firmware
destage the cache contents.

The test passes if you can now do the following:

(1) Mount the file system without error

(2) Unmount and force an fsck - that should run without reporting errors as well.

Note that the above does not use fsync in the testing.

Thanks!

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ted Ts'o on
On Fri, Aug 06, 2010 at 12:13:56AM -0700, Darrick J. Wong wrote:
> Yes, it's a proxy for something else. One of our larger products would like to
> use fsync() to flush dirty data out to disk (right now it looks like they use
> O_SYNC), but they're concerned that the many threads they use can create an
> fsync() storm. So, they wanted to know how to mitigate the effects of those
> storms. Not calling fsync() except when they really need to guarantee a disk
> write is a good start, but I'd like to get ahead of them to pick off more low
> hanging fruit like the barrier coordination and not sending barriers when
> there's no dirty data ... before they run into it. :)

Do they need a barrier operation, or do they just want to initiate the
I/O? One of the reasons I found it hard to believe you would have
multiple threads all fsync()'ing the same file is that keeping the the
file consistent is very hard to do in such a scenario. Maintaining
ACID-level consistency without a single thread which coordinates when
commit records gets written is I'm sure theoretically possible, but in
practice, I wasn't sure any applications would actually be _written_
that way.

If the goal is just to make sure I/O is getting initiated, without
necessarily waiting for assurance that a specific file write has hit
the disk platters, it may be that the Linux-specific
sync_file_range(2) system call might be a far more efficient way of
achieving those ends. Without more details about what this product is
doing, it's hard to say, of course.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/