From: Stefan Kaltenbrunner on
On 06/19/2010 09:13 PM, Tom Lane wrote:
> Robert Haas<robertmhaas(a)gmail.com> writes:
>> Right now, if the SR master reboots unexpectedly (say, power plug pull
>> and restart), the slave never notices. It just sits there forever
>> waiting for the next byte of data from the master to arrive (which it
>> never will).
>
> This is nonsense --- the slave's kernel *will* eventually notice that
> the TCP connection is dead, and tell walreceiver so. I don't doubt
> that the standard TCP timeout is longer than people want to wait for
> that, but claiming that it will never happen is simply wrong.
>
> I think that enabling slave-side TCP keepalives and control of the
> keepalive timeout parameters is probably sufficient for 9.0 here.

yeah I would agree - we do have tcp keepalive code in the backend for a
while now and adding that to libpq as well just seems like an easy
enough fix at this time in the release cycle.


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Florian Pflug on
On Jun 19, 2010, at 21:13 , Tom Lane wrote:
> Robert Haas <robertmhaas(a)gmail.com> writes:
>> Right now, if the SR master reboots unexpectedly (say, power plug pull
>> and restart), the slave never notices. It just sits there forever
>> waiting for the next byte of data from the master to arrive (which it
>> never will).
>
> This is nonsense --- the slave's kernel *will* eventually notice that
> the TCP connection is dead, and tell walreceiver so. I don't doubt
> that the standard TCP timeout is longer than people want to wait for
> that, but claiming that it will never happen is simply wrong.

No, Robert is correct AFAIK. If you're *waiting* for data, TCP generates no traffic (expect with keepalive enabled). From the slave's kernel POV, a dead master is therefore indistinguishable from a inactive master.

Things are different from a sender's POV, though. Since sent data is ACK'ed by the receiving end, the TCP stack can (and does) detect a broken connection.

best regards,
Florian Pflug


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Sat, 2010-06-19 at 14:53 -0400, Robert Haas wrote:
> On Sat, Jun 19, 2010 at 2:46 PM, Greg Stark <gsstark(a)mit.edu> wrote:
> > On Sat, Jun 19, 2010 at 2:43 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> >> 4. Streaming Replication needs to detect death of master. We need
> >> some sort of keep-alive, here. Whether it's at the TCP level (as
> >> advocated by Tom Lane and others) or at the protocol level (as
> >> advocated by Greg Stark) is something that we have yet to decide; once
> >> it's decided, someone will need to do it...
> >
> > This sounds like a useful feature but I don't see why it's not 9.1
> > material. The status quo is that the expected usage pattern is manual
> > failover. As long as the slave responds to manual intervention when in
> > this state I don't think this is a blocking issue. Monitoring and
> > automatic failover are clearly things we plan to add features to
> > handle better in the future.
>
> Right now, if the SR master reboots unexpectedly (say, power plug pull
> and restart), the slave never notices. It just sits there forever
> waiting for the next byte of data from the master to arrive (which it
> never will). You have to manually restart the server or hit
> walreceiver with a SIGTERM to get it to start streaming agian. I
> guess we could decide we're just not going to deal with that, but it
> seems like a fairly large misfeature to me.

Are you saying it doesn't respond to a trigger file any any point? That
would be a problem.

Sounds like we should have a pg_restart_walreceiver() function. We
shouldn't be encouraging people to send signals to backends, its too
easy to get wrong.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Florian Pflug <fgp(a)phlo.org> writes:
> On Jun 19, 2010, at 21:13 , Tom Lane wrote:
>> This is nonsense --- the slave's kernel *will* eventually notice that
>> the TCP connection is dead, and tell walreceiver so. I don't doubt
>> that the standard TCP timeout is longer than people want to wait for
>> that, but claiming that it will never happen is simply wrong.

> No, Robert is correct AFAIK. If you're *waiting* for data, TCP
> generates no traffic (expect with keepalive enabled).

Mph. I was thinking that keepalive was on by default with a very long
interval, but I see this isn't so. However, if we enable keepalive,
then it's irrelevant to the point anyway. Nobody's produced any
evidence that keepalive is an unsuitable solution.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andres Freund on
On Saturday 19 June 2010 18:05:34 Joshua D. Drake wrote:
> On Sat, 2010-06-19 at 09:43 -0400, Robert Haas wrote:
> > 4. Streaming Replication needs to detect death of master. We need
> > some sort of keep-alive, here. Whether it's at the TCP level (as
> > advocated by Tom Lane and others) or at the protocol level (as
> > advocated by Greg Stark) is something that we have yet to decide; once
> > it's decided, someone will need to do it...
>
> TCP involves unknowns, such as firewalls, vpn routers and ssh tunnels. I
> humbly suggest we *not* be pedantic and implement something practical
> and less prone to variables outside the control of Pg.
>
> Sincerely,
>++++ +
> Joshua D. Drake

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6
Prev: About tapes
Next: [HACKERS] beta3 & the open items list