From: Josh Berkus on 28 Jan 2010 15:05 Guys, > Hmm, I'm sorry but that's bogus. Retaining so much WAL that we are > strongly in danger of blowing disk space is not what I would call a > safety feature. Since there is no way to control or restrain the number > of files for certain, that approach seems fatally flawed. Reducing > checkpoint_timeout is the opposite of what you would want to do for > performance. Which WAL are we talking about here? There's 3 copies to worry about: 1) master WAL 2) the archive copy of WAL 3) slave WAL --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 29 Jan 2010 02:49 Simon Riggs wrote: > On Thu, 2010-01-28 at 21:00 +0200, Heikki Linnakangas wrote: >> I think it is a pretty important safety feature that we keep all the >> WAL around that's needed to recover the standby. To avoid >> out-of-disk-space situation, it's probably enough in practice to set >> checkpoint_timeout small enough in the standby to trigger >> restartpoints often enough. > > Hmm, I'm sorry but that's bogus. Retaining so much WAL that we are > strongly in danger of blowing disk space is not what I would call a > safety feature. Since there is no way to control or restrain the number > of files for certain, that approach seems fatally flawed. The other alternative is to refuse to recover if the master can't be contacted to stream the missing WAL again. Surely that's worse. Note that we don't have any hard limits on WAL disk usage in general. For example, if archiving stops working for some reason, you'll accumulate WAL in the master until it runs out of disk space. > Reducing > checkpoint_timeout is the opposite of what you would want to do for > performance. Well, make sure you have enough disk space for a higher setting then. It doesn't seem that hard. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 29 Jan 2010 03:22 On Fri, 2010-01-29 at 09:49 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Thu, 2010-01-28 at 21:00 +0200, Heikki Linnakangas wrote: > >> I think it is a pretty important safety feature that we keep all the > >> WAL around that's needed to recover the standby. To avoid > >> out-of-disk-space situation, it's probably enough in practice to set > >> checkpoint_timeout small enough in the standby to trigger > >> restartpoints often enough. > > > > Hmm, I'm sorry but that's bogus. Retaining so much WAL that we are > > strongly in danger of blowing disk space is not what I would call a > > safety feature. Since there is no way to control or restrain the number > > of files for certain, that approach seems fatally flawed. > > The other alternative is to refuse to recover if the master can't be > contacted to stream the missing WAL again. Surely that's worse. What is the behaviour of the standby if it hits a disk full error while receiving WAL? Hopefully it stops receiving WAL and then clears enough disk space to allow it to receive from archive instead? Yet stays up to allow queries to continue? -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Fujii Masao on 29 Jan 2010 03:31 On Fri, Jan 29, 2010 at 4:13 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: > Hmm, I'm sorry but that's bogus. Retaining so much WAL that we are > strongly in danger of blowing disk space is not what I would call a > safety feature. Since there is no way to control or restrain the number > of files for certain, that approach seems fatally flawed. Reducing > checkpoint_timeout is the opposite of what you would want to do for > performance. Why do you worry about that only in the standby? The primary (i.e., postgres in the normal mode) has been in the same situation until now. But usually the cycle of restartpoint is longer than that of checkpoint. Because restartpoint occurs when the checkpoint record has been replayed AND checkpoint_timeout has been reached. So the WAL files might more easily accumulate in the standby. To improve the situation, I think that we need to use checkpoint_segment/timeout as a trigger of restartpoint, regardless of the checkpoint record. Though I'm not sure that is possible and should be included in v9.0. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 29 Jan 2010 03:41
On Fri, 2010-01-29 at 17:31 +0900, Fujii Masao wrote: > On Fri, Jan 29, 2010 at 4:13 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: > > Hmm, I'm sorry but that's bogus. Retaining so much WAL that we are > > strongly in danger of blowing disk space is not what I would call a > > safety feature. Since there is no way to control or restrain the number > > of files for certain, that approach seems fatally flawed. Reducing > > checkpoint_timeout is the opposite of what you would want to do for > > performance. > > Why do you worry about that only in the standby? I don't. The "safety feature" we just added makes it much more likely that this will happen on standby. > To improve the situation, I think that we need to use > checkpoint_segment/timeout as a trigger of restartpoint, regardless > of the checkpoint record. Though I'm not sure that is possible and > should be included in v9.0. Yes, that is a simple change. I think it is needed now. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |