Streaming replication status [PgSql]

Prev: [HACKERS] synchronized snapshots
Next: synchronized snapshots

From: Robert Treat on 12 Jan 2010 20:44

On Monday 11 January 2010 23:24:24 Greg Smith wrote:
> Fujii Masao wrote:
> > On Mon, Jan 11, 2010 at 5:36 PM, Craig Ringer
> >
> > <craig(a)postnewspapers.com.au> wrote:
> >> Personally, I'd be uncomfortable enabling something like that without
> >> _both_ an admin alert _and_ the ability to refresh the slave's base
> >> backup without admin intervention.
> >
> > What feature do you specifically need as an alert? Just writing
> > the warning into the logfile is enough? Or need to notify by
> > using SNMP trap message? Though I'm not sure if this is a role
> > of Postgres.
>
> It's impossible for the database to have any idea whatsoever how people
> are going to want to be alerted. Provide functions to monitor things
> like replication lag, like the number of segments queued up to feed to
> archive_command, and let people build their own alerting mechanism for
> now. They're going to do that anyway, so why waste precious time here
> building someone that's unlikely to fit any but a very narrow use case?

That said, emitting the information to a log file makes for a crappy way to
retrieve the information. The ideal api is that I can find the information out
via result of some SELECT query; view, table ,function doesn't matter, as long
as I can select it out. Bonus points for being able to get information from
the hot standby.

--
Robert Treat
Conjecture: http://www.xzilla.net
Consulting: http://www.omniti.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on 12 Jan 2010 22:05

> I guess the slightly more ambitious performance monitoring bits that
> Simon was suggesting may cross the line as being too late to implement
> now though (depends on how productive the people actually coding on this
> are I guess), and certainly the ideas thrown out for implementing any
> smart behavior or alerting when replication goes bad like Josh's
> "archiving_lag_action" seem based the deadline to get addressed
> now--even though I agree with the basic idea.

Well, honestly, I wasn't talking about monitoring at all. I was talking
about the general issue of "how should the system behave when it runs
out of disk space".

For the installation for which data integrity is paramount, when
replication becomes impossible because there is no more room for logs,
then the whole system, master and slaves, should shut down. For most
people, they'd just want the master to start ignoring the slave and
recycling logs. Presumably, the slave would notice this and shut down.

So I was talking about data integrity, not monitoring.

However, it's probably a better thing to simply expose a way to query
how much extra log data we have, in raw form (bytes or pages). From
this, an administration script could take appropriate action.

--Josh Berkus

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on 12 Jan 2010 22:55

> However, it's probably a better thing to simply expose a way to query
> how much extra log data we have, in raw form (bytes or pages). From
> this, an administration script could take appropriate action.

Also: I think we could release without having this facility. We did
with PITR, after all.

--Josh

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Bruce Momjian on 12 Jan 2010 23:26

Simon Riggs wrote:
> On Tue, 2010-01-12 at 15:42 -0500, Tom Lane wrote:
> > Bruce Momjian <bruce(a)momjian.us> writes:
> > > The final commit-fest is in 5 days --- this is not the time for design
> > > discussion and feature additions.
> >
> > +10 --- the one reason I can see for deciding to bounce SR is that there
> > still seem to be design discussions going on. It is WAY TOO LATE for
> > that folks. It's time to be thinking "what's the least we have to do to
> > make this shippable?"
>
> I've not asked to bounce SR, I am strongly in favour of it going in,
> having been supporting the project on and off for 18 months.
>
> There is not much sense being talked here. I have asked for sufficient
> monitoring to allow us to manage it in production, which is IMHO the
> minimum required to make it shippable. This is a point I have mentioned

Let me explain why Simon feels he is misquoted --- Simon, you are saying
above that "sufficient monitoring" is a minimum requirement, meaning it
is necessary, and I and others are saying if we need to design a
monitoring system at this stage to ship SR, then let's forget about this
feature for 8.5.

In summary, by requiring monitoring, you are encouraging others to just
abandon SR completely for 8.5. We didn't say you were suggesting
abandonment SR, it is just that the monitoring requirement is making
abandonment of SR for 8.5 more likely because the addition of monitoring
could hopelessly delay 8.5 because we have no idea even how to implement
monitoring.

> over the course of many months, not a sudden additional thought.
>
> Overall, it isn't sensible or appropriate to oppose my viewpoint by
> putting words into my mouth that have never been said, which applies to
> most people's comments to me on this recent thread.

Yea, yea, everyone seems to misquote you Simon, at least from your
perspective. You must admit that you seem to feel that way a lot.

> If the majority thinks that being able to find out the current replay
> point of recovery is all we need to manage replication then I will
> happily defer to that view, without changing my opinion that we need
> more. It should be clear that we didn't even have that before I raised
> the point.

Good --- let's move forward with a minimal feature set to get SR in 8.5
in a reasonable timeframe. If we have extra time we can add stuff but
let's not require it from the start.

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on 13 Jan 2010 00:10

On Tue, Jan 12, 2010 at 10:59 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Fujii Masao <masao.fujii(a)gmail.com> writes:
>> I'm not sure whether poll(2) should be called for this purpose. But
>> poll(2) and select(2) seem to often come together in the existing code.
>> We should follow such custom?
>
> Yes. poll() is usually more efficient, so it's preferred, but not all
> platforms have it. (On the other side, I think Windows might have
> only poll and not select.)

OK. I reactivated pq_wait() and secure_poll() which uses poll(2) to
check the socket if available, otherwise select(2).

Also the capability to check the socket for data to be written is not
unused for SR right now (it was provided previously). So I dropped it
for simplification.

http://archives.postgresql.org/pgsql-hackers/2010-01/msg00827.php
> Oh, I think we need to fix that, I'm thinking of doing a select() in the
> loop to check that the socket hasn't been closed yet. I meant we don't
> need to try reading the 'X' to tell apart e.g a network problem from a
> standby that's shut down cleanly.

Without reading the 'X' message from the standby, the walsender doesn't
detect the close of connection immediately in my environment. So I also
reactivated the subset of ProcessStreamMessage().

git://git.postgresql.org/git/users/fujii/postgres.git
branch: replication

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Prev: [HACKERS] synchronized snapshots
Next: synchronized snapshots