From: Heikki Linnakangas on 10 Jan 2010 11:40 Simon Riggs wrote: > On Fri, 2010-01-08 at 14:20 -0800, Josh Berkus wrote: >> On 1/8/10 1:16 PM, Heikki Linnakangas wrote: >>> * A standby that connects to master, initiates streaming, and then sits >>> idle without stalls recycling of old WAL files in the master. That will >>> eventually lead to a full disk in master. Do we need some kind of a >>> emergency valve on that? >> WARNING: I haven't thought about how this would work together with HS yes. > > I've been reviewing things as we go along, so I'm not that tense > overall. Having said that I don't understand why the problem above would > occur and the sentence seems to be missing a verb between "without" and > "stalls". More explanation please. Yeah, that sentence was broken. > What could happen is that the standby could slowly lag behind master. Right, that's what I'm worried about. In the worst case it the walreceiver process in the standby might stall completely for some reason, e.g hardware problem or SIGSTOP by an administrator. > We > don't have any way of monitoring that, as yet. Setting ps display is not > enough here. Yeah, monitoring would be nice too. But what I was wondering is whether we need some way of stopping that from filling the disk in master. (Fujii-san's suggestion of a GUC to set the max. amount of WAL to keep in the master for standbys feels good to me). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 10 Jan 2010 12:40 On Sun, 2010-01-10 at 18:40 +0200, Heikki Linnakangas wrote: > > We > > don't have any way of monitoring that, as yet. Setting ps display is not > > enough here. > > Yeah, monitoring would be nice too. But what I was wondering is whether > we need some way of stopping that from filling the disk in master. > (Fujii-san's suggestion of a GUC to set the max. amount of WAL to keep > in the master for standbys feels good to me). OK, now I got you. I thought that was already agreed; guess it is now. We need monitoring anywhere we have a max_* parameter. Otherwise we won't know how close we are to disaster until we hit the limit and things break down. Otherwise we will have to set parameters by trial and error, or set them so high they are meaningless. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Josh Berkus on 10 Jan 2010 15:10 > We need monitoring anywhere we have a max_* parameter. Otherwise we > won't know how close we are to disaster until we hit the limit and > things break down. Otherwise we will have to set parameters by trial and > error, or set them so high they are meaningless. I agree. Thing is, though, we have a de-facto max already ... when pgxlog runs out of disk space. And no monitoring *in postgresql* for that, although obviously you can use OS monitoring for it. I'm saying, even for plain PITR, it would be an improvement in manageablity if the DBA could set a maximum number of checkpoint segments before replication is abandonded or the master shuts down. It's something we've been missing. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 10 Jan 2010 17:30 On Sun, 2010-01-10 at 12:10 -0800, Josh Berkus wrote: > > We need monitoring anywhere we have a max_* parameter. Otherwise we > > won't know how close we are to disaster until we hit the limit and > > things break down. Otherwise we will have to set parameters by trial and > > error, or set them so high they are meaningless. > > I agree. > > Thing is, though, we have a de-facto max already ... when pgxlog runs > out of disk space. What I mean is this: The purpose of monitoring is to avoid bad things happening by being able to predict that a bad thing will happen before it actually does happen. Cars have windows to allow us to see we are about to hit something. > And no monitoring *in postgresql* for that, although > obviously you can use OS monitoring for it. PostgreSQL doesn't need to monitor that. If the user wants to avoid out-of-space they can write a script to monitor files/space. The info is accessible, if you wish to monitor it. Currently there is no way of knowing what the average/current transit time is on replication, no way of knowing what is happening if we go idle etc.. Those things need to be included because they are not otherwise accessible. Cars need windows, not just a finely tuned engine. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Josh Berkus on 10 Jan 2010 18:14
> Currently there is no way of knowing what the average/current transit > time is on replication, no way of knowing what is happening if we go > idle etc.. Those things need to be included because they are not > otherwise accessible. Cars need windows, not just a finely tuned engine. Like I said, I agree. I'm just pointing out that the monitoring deficiency already exists whether or not we add a max_* parameter. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |