From: Simon Riggs on 12 Jan 2010 02:39 On Tue, 2010-01-12 at 08:24 +0100, Stefan Kaltenbrunner wrote: > Fujii Masao wrote: > > On Tue, Jan 12, 2010 at 1:21 PM, Greg Smith <greg(a)2ndquadrant.com> wrote: > >> I don't think anybody can deploy this feature without at least some very > >> basic monitoring here. I like the basic proposal you made back in September > >> for adding a pg_standbys_xlog_location to replace what you have to get from > >> ps right now: > >> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php > >> > >> That's basic, but enough that people could get by for a V1. > > > > Yeah, I have no objection to add such simple capability which monitors > > the lag into the first release. But I guess that, in addition to that, > > Simon wanted the capability to collect the statistical information about > > replication activity (e.g., a transfer time, a write time, replay time). > > So I'd like to postpone it. > > yeah getting that would all be nice and handy but we have to remember > that this is really our first cut at integrated replication. Being able > to monitor lag is what is needed as a minimum, more advanced stuff can > and will emerge once we get some actual feedback from the field. Though there won't be any feedback from the field because there won't be any numbers to discuss. Just "it appears to be working". Then we will go into production and the problems will begin to be reported. We will be able to do nothing to resolve them because we won't know how many people are affected. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 12 Jan 2010 03:04 Heikki Linnakangas wrote: > Greg Smith wrote: > >> I don't think anybody can deploy this feature without at least some very >> basic monitoring here. I like the basic proposal you made back in >> September for adding a pg_standbys_xlog_location to replace what you >> have to get from ps right now: >> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php >> >> That's basic, but enough that people could get by for a V1. >> > > It would be more straightforward to have a function in the standby to > return the current replay location. It feels more logical to poll the > standby to get the status of the standby, instead of indirectly from the > master. Besides, the master won't know how far the standby is if the > connection to the standby is broken. > This is one reason I was talking in my other message about getting simple stats on how bad the archive_command backlog is, which I'd think is an easy way to inform the DBA "the standby isn't keeping up and disk is filling" in a way that's more database-centric than just looking at disk space getting gobbled. I think that it's important to be able to get whatever useful information you can from both the primary and the standby, because most of the interesting (read: painful) situations here are when one or the other is down. The fundamental questions here are: -When things are running normally, how much is the standby lagging by? This is needed for a baseline of good performance, by which you can detect problems before they get too bad. -If the standby is down altogether, how can I get more information about the state of things from the primary? -If the primary is down, how can I tell more from the standby? Predicting what people are going to want to do when one of these bad conditions pops up is a large step ahead of where I think this discussion should be focusing on now. You have to show how you're going to measure the badness here in the likely failure situations before you can then take action on them. If you do the former well enough, admins will figure out how to deal with the latter in a way compatible with their business processes in the first version. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.com
From: Fujii Masao on 12 Jan 2010 05:15 On Tue, Jan 12, 2010 at 4:22 PM, Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> wrote: > It would be more straightforward to have a function in the standby to > return the current replay location. It feels more logical to poll the > standby to get the status of the standby, instead of indirectly from the > master. Besides, the master won't know how far the standby is if the > connection to the standby is broken. > > Maybe we should just change the existing pg_current_xlog_location() > function to return that when recovery is in progress. It currently > throws an error during hot standby. Sounds good. I'd like to hear from someone which location should be returned by that function (WAL receive/write/flush/replay location?). I vote for WAL flush location because it's important for me to know how far the standby can replay the WAL, i.e., how much transactions might be lost at failover. And, it's also OK to provide the dedicated function for WAL replay location. Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Magnus Hagander on 12 Jan 2010 05:18 On Tue, Jan 12, 2010 at 08:22, Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> wrote: > Greg Smith wrote: >> I don't think anybody can deploy this feature without at least some very >> basic monitoring here. I like the basic proposal you made back in >> September for adding a pg_standbys_xlog_location to replace what you >> have to get from ps right now: >> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php >> >> That's basic, but enough that people could get by for a V1. > > It would be more straightforward to have a function in the standby to > return the current replay location. It feels more logical to poll the > standby to get the status of the standby, instead of indirectly from the > master. Besides, the master won't know how far the standby is if the > connection to the standby is broken. > > Maybe we should just change the existing pg_current_xlog_location() > function to return that when recovery is in progress. It currently > throws an error during hot standby. > Not sure. I don't really like to monitor functions that return different things depending on a scenario. Assume I monitor it, and then do a failover. Suddenly the values I monitor mean something else. I think I'd prefer a separate function to monitor this status on the slave. Oh, and it'd be nice if that one worked in HS mode both when in streaming and non-streaming mode :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 12 Jan 2010 08:59
Fujii Masao <masao.fujii(a)gmail.com> writes: > I'm not sure whether poll(2) should be called for this purpose. But > poll(2) and select(2) seem to often come together in the existing code. > We should follow such custom? Yes. poll() is usually more efficient, so it's preferred, but not all platforms have it. (On the other side, I think Windows might have only poll and not select.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |