From: Ben Gamari on
Hey all,

Recently I started using the Xapian-based notmuch mail client for everyday
use. One of the things I was quite surprised by after the switch was the
incredible hit in interactive performance that is observed during database
updates. Things are particularly bad during runs of 'notmuch new,' which scans
the file system looking for new messages and adds them to the database.
Specifically, the worst of the performance hit appears to occur when the
database is being updated.

During these periods, even small chunks of I/O can become minute-long ordeals.
It is common for latencytop to show 30 second long latencies for page faults
and writing pages. Interactive performance is absolutely abysmal, with other
unrelated processes feeling horrible latencies, causing media players,
editors, and even terminals to grind to a halt.

Despite the system being clearly I/O bound, iostat shows pitiful disk
throughput (700kByte/second read, 300 kByte/second write). Certainly this poor
performance can, at least to some degree, be attributable to the fact that
Xapian uses fdatasync() to ensure data consistency. That being said, it seems
like Xapian's page usage causes horrible thrashing, hence the performance hit
on unrelated processes. Moreover, the hit on unrelated processes is so bad
that I would almost suspect that swap I/O is being serialized by fsync() as
well, despite being on a separate swap partition beyond the control of the
filesystem.

Xapian, however, is far from the first time I have seen this sort of
performance cliff. Rsync, which also uses fsync(), can also trigger this sort
of thrashing during system backups, as can rdiff. slocate's updatedb
absolutely kills interactive performance as well.

Issues similar to this have been widely reported[1-5] in the past, and despite
many attempts[5-8] within both I/O and memory managements subsystems to fix
it, the problem certainly remains. I have tried reducing swappiness from 60 to
40, with some small improvement and it has been reported[20] that these sorts
of symptoms can be negated through use of memory control groups to prevent
interactive process pages from being evicted.

I would really like to see this issue finally fixed. I have tried
several[2][3] times to organize the known data about this bug, although in all
cases discussion has stopped with claims of insufficient data (which is fair,
admittedly, it's a very difficult issue to tackle). However, I do think that
_something_ has to be done to alleviate the thrashing and poor interactive
performance that these work-loads cause.

Thanks,

- Ben


[1] http://bugzilla.kernel.org/show_bug.cgi?id=5900
[2] http://bugzilla.kernel.org/show_bug.cgi?id=7372
[3] http://bugzilla.kernel.org/show_bug.cgi?id=12309
[4] http://lkml.org/lkml/2009/4/28/24
[5] http://lkml.org/lkml/2009/3/26/72
[6] http://notmuchmail.org/pipermail/notmuch/2010/001868.html

[10] http://lkml.org/lkml/2009/5/16/225
[11] http://lkml.org/lkml/2007/7/21/219
[12] http://lwn.net/Articles/328363/
[13] http://lkml.org/lkml/2009/4/6/114

[20] http://lkml.org/lkml/2009/4/28/68

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: tytso on
On Tue, Mar 16, 2010 at 08:31:12AM -0700, Ben Gamari wrote:
> Hey all,
>
> Recently I started using the Xapian-based notmuch mail client for everyday
> use. One of the things I was quite surprised by after the switch was the
> incredible hit in interactive performance that is observed during database
> updates. Things are particularly bad during runs of 'notmuch new,' which scans
> the file system looking for new messages and adds them to the database.
> Specifically, the worst of the performance hit appears to occur when the
> database is being updated.

What kernel version are you using; what distribution and what version
of that distro are you running; what file system are you using and
what if any mount options are you using? And what kind of hard drives
do you have?

I'm going to assume you're running into the standard ext3
"data=ordered" entagled writes problem. There are solutions, such as
switching to using ext4, mounting with data=writeback mode, but they
have various shortcomings.

A number of improvements have been made in ext3 and ext4 since some of
the discussions you quoted, but since you didn't tell us what
distribution version and/or what kernel version you are using, we
can't tell you are using those newer improvements yet.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ben Gamari on
Sorry about the lack of any useful information in my initial email.
I clearly didn't read it before sending.

On Tue, 16 Mar 2010 21:24:39 -0400, tytso(a)mit.edu wrote:
> What kernel version are you using; what distribution and what version
> of that distro are you running; what file system are you using and
> what if any mount options are you using? And what kind of hard drives
> do you have?

While this problem has been around for some time, my current configuration
is the following:

Kernel 2.6.32 (although also reproducible with kernels at least as early as 2.6.28)
Filesystem: Now Btrfs (was ext4 less than a week ago), default mount options
Hard drive: Seagate Momentus 7200.4 (ST9500420AS)
Distribution: Ubuntu 9.10 (Karmic)

>
> I'm going to assume you're running into the standard ext3
> "data=ordered" entagled writes problem. There are solutions, such as
> switching to using ext4, mounting with data=writeback mode, but they
> have various shortcomings.
>
Unfortunately several people have continued to encounter unacceptable
latency, even with ext4 and data=writeback.

> A number of improvements have been made in ext3 and ext4 since some of
> the discussions you quoted, but since you didn't tell us what
> distribution version and/or what kernel version you are using, we
> can't tell you are using those newer improvements yet.
>
Sorry about that. I should know better by now.

- Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: tytso on
On Tue, Mar 16, 2010 at 08:18:09PM -0700, Ben Gamari wrote:
> Sorry about the lack of any useful information in my initial email.
> I clearly didn't read it before sending.
>
> On Tue, 16 Mar 2010 21:24:39 -0400, tytso(a)mit.edu wrote:
> > What kernel version are you using; what distribution and what version
> > of that distro are you running; what file system are you using and
> > what if any mount options are you using? And what kind of hard drives
> > do you have?
>
> While this problem has been around for some time, my current configuration
> is the following:
>
> Kernel 2.6.32 (although also reproducible with kernels at least as early as 2.6.28)
> Filesystem: Now Btrfs (was ext4 less than a week ago), default mount options
> Hard drive: Seagate Momentus 7200.4 (ST9500420AS)
> Distribution: Ubuntu 9.10 (Karmic)

..... so did switching to Btrfs solve your latency issues, or are you
still having problems?

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ben Gamari on
On Tue, 16 Mar 2010 23:30:10 -0400, tytso(a)mit.edu wrote:
> .... so did switching to Btrfs solve your latency issues, or are you
> still having problems?

Still having troubles although I'm now running 2.6.34-rc1 and things seem
mildly better. I'll try doing a backup tonight and report back.

- Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/