From: Heikki Linnakangas on
Simon Riggs wrote:
> On Fri, 2010-01-08 at 14:20 -0800, Josh Berkus wrote:
>> On 1/8/10 1:16 PM, Heikki Linnakangas wrote:
>>> * A standby that connects to master, initiates streaming, and then sits
>>> idle without stalls recycling of old WAL files in the master. That will
>>> eventually lead to a full disk in master. Do we need some kind of a
>>> emergency valve on that?
>> WARNING: I haven't thought about how this would work together with HS yes.
>
> I've been reviewing things as we go along, so I'm not that tense
> overall. Having said that I don't understand why the problem above would
> occur and the sentence seems to be missing a verb between "without" and
> "stalls". More explanation please.

Yeah, that sentence was broken.

> What could happen is that the standby could slowly lag behind master.

Right, that's what I'm worried about. In the worst case it the
walreceiver process in the standby might stall completely for some
reason, e.g hardware problem or SIGSTOP by an administrator.

> We
> don't have any way of monitoring that, as yet. Setting ps display is not
> enough here.

Yeah, monitoring would be nice too. But what I was wondering is whether
we need some way of stopping that from filling the disk in master.
(Fujii-san's suggestion of a GUC to set the max. amount of WAL to keep
in the master for standbys feels good to me).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Sun, 2010-01-10 at 18:40 +0200, Heikki Linnakangas wrote:

> > We
> > don't have any way of monitoring that, as yet. Setting ps display is not
> > enough here.
>
> Yeah, monitoring would be nice too. But what I was wondering is whether
> we need some way of stopping that from filling the disk in master.
> (Fujii-san's suggestion of a GUC to set the max. amount of WAL to keep
> in the master for standbys feels good to me).

OK, now I got you. I thought that was already agreed; guess it is now.

We need monitoring anywhere we have a max_* parameter. Otherwise we
won't know how close we are to disaster until we hit the limit and
things break down. Otherwise we will have to set parameters by trial and
error, or set them so high they are meaningless.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on

> We need monitoring anywhere we have a max_* parameter. Otherwise we
> won't know how close we are to disaster until we hit the limit and
> things break down. Otherwise we will have to set parameters by trial and
> error, or set them so high they are meaningless.

I agree.

Thing is, though, we have a de-facto max already ... when pgxlog runs
out of disk space. And no monitoring *in postgresql* for that, although
obviously you can use OS monitoring for it.

I'm saying, even for plain PITR, it would be an improvement in
manageablity if the DBA could set a maximum number of checkpoint
segments before replication is abandonded or the master shuts down.
It's something we've been missing.

--Josh Berkus


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Sun, 2010-01-10 at 12:10 -0800, Josh Berkus wrote:
> > We need monitoring anywhere we have a max_* parameter. Otherwise we
> > won't know how close we are to disaster until we hit the limit and
> > things break down. Otherwise we will have to set parameters by trial and
> > error, or set them so high they are meaningless.
>
> I agree.
>
> Thing is, though, we have a de-facto max already ... when pgxlog runs
> out of disk space.

What I mean is this: The purpose of monitoring is to avoid bad things
happening by being able to predict that a bad thing will happen before
it actually does happen. Cars have windows to allow us to see we are
about to hit something.

> And no monitoring *in postgresql* for that, although
> obviously you can use OS monitoring for it.

PostgreSQL doesn't need to monitor that. If the user wants to avoid
out-of-space they can write a script to monitor files/space. The info is
accessible, if you wish to monitor it.

Currently there is no way of knowing what the average/current transit
time is on replication, no way of knowing what is happening if we go
idle etc.. Those things need to be included because they are not
otherwise accessible. Cars need windows, not just a finely tuned engine.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on

> Currently there is no way of knowing what the average/current transit
> time is on replication, no way of knowing what is happening if we go
> idle etc.. Those things need to be included because they are not
> otherwise accessible. Cars need windows, not just a finely tuned engine.

Like I said, I agree. I'm just pointing out that the monitoring
deficiency already exists whether or not we add a max_* parameter.

--Josh Berkus


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers