From: Simon Riggs on
On Tue, 2010-01-12 at 15:42 -0500, Tom Lane wrote:
> Bruce Momjian <bruce(a)momjian.us> writes:
> > The final commit-fest is in 5 days --- this is not the time for design
> > discussion and feature additions.
>
> +10 --- the one reason I can see for deciding to bounce SR is that there
> still seem to be design discussions going on. It is WAY TOO LATE for
> that folks. It's time to be thinking "what's the least we have to do to
> make this shippable?"

I've not asked to bounce SR, I am strongly in favour of it going in,
having been supporting the project on and off for 18 months.

There is not much sense being talked here. I have asked for sufficient
monitoring to allow us to manage it in production, which is IMHO the
minimum required to make it shippable. This is a point I have mentioned
over the course of many months, not a sudden additional thought.

If the majority thinks that being able to find out the current replay
point of recovery is all we need to manage replication then I will
happily defer to that view, without changing my opinion that we need
more. It should be clear that we didn't even have that before I raised
the point.

Overall, it isn't sensible or appropriate to oppose my viewpoint by
putting words into my mouth that have never been said, which applies to
most people's comments to me on this recent thread.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on
Bruce Momjian wrote:
> Right, so what is the risk of shipping without any fancy monitoring?
>

You can monitor the code right now by watching the output shown in the
ps display and by trolling the database logs. If I had to I could build
a whole monitoring system out of those components, it would just be very
fragile. I'd rather see one or two very basic bits of internals exposed
beyond those to reduce that effort. I think it's a stretch to say that
request represents a design change; a couple of UDFs to expose some
internals is all I think it would take to dramatically drop the amount
of process/log scraping required here to support a SR system.

I guess the slightly more ambitious performance monitoring bits that
Simon was suggesting may cross the line as being too late to implement
now though (depends on how productive the people actually coding on this
are I guess), and certainly the ideas thrown out for implementing any
smart behavior or alerting when replication goes bad like Josh's
"archiving_lag_action" seem based the deadline to get addressed
now--even though I agree with the basic idea.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Joshua D. Drake" on
On Tue, 2010-01-12 at 17:41 -0500, Greg Smith wrote:
> Bruce Momjian wrote:
> > Right, so what is the risk of shipping without any fancy monitoring?
> >
>
> You can monitor the code right now by watching the output shown in the
> ps display and by trolling the database logs. If I had to I could build
> a whole monitoring system out of those components, it would just be very
> fragile. I'd rather see one or two very basic bits of internals exposed
> beyond those to reduce that effort.

Considering that is pretty much the best we can do with log shipping, I
would have to agree. We should either provide real monitoring facilities
(not necessarily tools, but at least queries or an api) for the feature
or the feature isn't ready to go in.

> I think it's a stretch to say that
> request represents a design change; a couple of UDFs to expose some
> internals is all I think it would take to dramatically drop the amount
> of process/log scraping required here to support a SR system.

Bingo.

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Stefan Kaltenbrunner on
Greg Smith wrote:
> Bruce Momjian wrote:
>> Right, so what is the risk of shipping without any fancy monitoring?
>>
>
> You can monitor the code right now by watching the output shown in the
> ps display and by trolling the database logs. If I had to I could build
> a whole monitoring system out of those components, it would just be very
> fragile. I'd rather see one or two very basic bits of internals exposed
> beyond those to reduce that effort. I think it's a stretch to say that
> request represents a design change; a couple of UDFs to expose some
> internals is all I think it would take to dramatically drop the amount
> of process/log scraping required here to support a SR system.

so is there an actually concrete proposal of _what_ interals to expose?

>
> I guess the slightly more ambitious performance monitoring bits that
> Simon was suggesting may cross the line as being too late to implement
> now though (depends on how productive the people actually coding on this
> are I guess), and certainly the ideas thrown out for implementing any
> smart behavior or alerting when replication goes bad like Josh's
> "archiving_lag_action" seem based the deadline to get addressed
> now--even though I agree with the basic idea.

I'm not convinced that embedding actual alerting functionality in the
database is a good idea. Any reasonable production deployment is
probably using a dedicated monitoring and alerting system that is
aggregating and qualifying all monitoring results (as wel as proper
ratelimiting and stuff) that just needs a way to read in basic data.
Initially something like archiving_lag_action sounds like an invitation
to do a send_mail_to_admin() thingy which is really the wrong way to
approach monitoring in large scale environments...
The database needs to prove very basic information like "we are 10min
behind in replication" or "3 wal files behind" - the decision if any of
that is an actual issue or not should be left to the actual monitoring
system.


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on
Stefan Kaltenbrunner <stefan(a)kaltenbrunner.cc> wrote:

> The database needs to prove very basic information like "we are
> 10min behind in replication" or "3 wal files behind" - the
> decision if any of that is an actual issue or not should be left
> to the actual monitoring system.

+1

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers