From: Josh Berkus on 16 Jan 2010 18:53 > I'd happily write a patch to handle all that if I thought it would be > accepted. I fear that the whole approach will be considered a bit too > hackish and get rejected on that basis though. Not really sure of a > "right" way to handle this though. Anything better is going to be more > complicated because it requires passing more information into the > archiver, with little gain for that work beyond improving the quality of > this diagnostic routine. And I think most people would find what I > described above useful enough. Yeah, I think we should focus right now on "what monitoring can we get into this version without holding up release?" Your proposal sounds like a good one in that respect. In future versions, I think we'll want a host of granular data on including: * amount of *time* since last successful archive (this would be a good trigger for alerts) * number of failed archive attempts * number of archive files awaiting processing (presumably monitored by the slave) * last archive file processed by the slave, and when * for HS: frequency and length of conflict delays in log processing, as a stat * for HS: number of query cancels due to write/lock conflicts from the master, as a stat However, *all* of the above can wait for the next version, especially since by then we'll have user feedback from the field on required monitoring. If we try to nail this all down now, not only will it delay the release, but we'll get it wrong and have to re-do it anyway. Release early and often, y'know? I think it's key to keep our data as granular and low-level as possible; with good low-level data people can write good tools, but if we over-summarize they can't. Also, it would be nice to have all of our archiving stuff grouped into something like pg_stat_archive rather than being a bunch of disconnected functions. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Fujii Masao on 17 Jan 2010 22:03
On Sun, Jan 17, 2010 at 8:53 AM, Josh Berkus <josh(a)agliodbs.com> wrote: > * amount of *time* since last successful archive (this would be a good > trigger for alerts) > * number of failed archive attempts > * number of archive files awaiting processing (presumably monitored by > the slave) > * last archive file processed by the slave, and when Are these for warm-standby, not SR? At least SR isn't so much involved in WAL archiving, i.e, WAL is sent to the standby by walsender instead of an archiver. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |