beta3 & the open items list [PgSql]

Prev: About tapes
Next: [HACKERS] beta3 & the open items list

From: "Joshua D. Drake" on 19 Jun 2010 12:05

On Sat, 2010-06-19 at 09:43 -0400, Robert Haas wrote:

> 4. Streaming Replication needs to detect death of master. We need
> some sort of keep-alive, here. Whether it's at the TCP level (as
> advocated by Tom Lane and others) or at the protocol level (as
> advocated by Greg Stark) is something that we have yet to decide; once
> it's decided, someone will need to do it...

TCP involves unknowns, such as firewalls, vpn routers and ssh tunnels. I
humbly suggest we *not* be pedantic and implement something practical
and less prone to variables outside the control of Pg.

Sincerely,

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on 19 Jun 2010 14:46

On Sat, Jun 19, 2010 at 2:43 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> 4. Streaming Replication needs to detect death of master. �We need
> some sort of keep-alive, here. �Whether it's at the TCP level (as
> advocated by Tom Lane and others) or at the protocol level (as
> advocated by Greg Stark) is something that we have yet to decide; once
> it's decided, someone will need to do it...

This sounds like a useful feature but I don't see why it's not 9.1
material. The status quo is that the expected usage pattern is manual
failover. As long as the slave responds to manual intervention when in
this state I don't think this is a blocking issue. Monitoring and
automatic failover are clearly things we plan to add features to
handle better in the future.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 19 Jun 2010 14:53

On Sat, Jun 19, 2010 at 2:46 PM, Greg Stark <gsstark(a)mit.edu> wrote:
> On Sat, Jun 19, 2010 at 2:43 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>> 4. Streaming Replication needs to detect death of master. �We need
>> some sort of keep-alive, here. �Whether it's at the TCP level (as
>> advocated by Tom Lane and others) or at the protocol level (as
>> advocated by Greg Stark) is something that we have yet to decide; once
>> it's decided, someone will need to do it...
>
> This sounds like a useful feature but I don't see why it's not 9.1
> material. The status quo is that the expected usage pattern is manual
> failover. As long as the slave responds to manual intervention when in
> this state I don't think this is a blocking issue. Monitoring and
> automatic failover are clearly things we plan to add features to
> handle better in the future.

Right now, if the SR master reboots unexpectedly (say, power plug pull
and restart), the slave never notices. It just sits there forever
waiting for the next byte of data from the master to arrive (which it
never will). You have to manually restart the server or hit
walreceiver with a SIGTERM to get it to start streaming agian. I
guess we could decide we're just not going to deal with that, but it
seems like a fairly large misfeature to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 19 Jun 2010 15:13

Robert Haas <robertmhaas(a)gmail.com> writes:
> Right now, if the SR master reboots unexpectedly (say, power plug pull
> and restart), the slave never notices. It just sits there forever
> waiting for the next byte of data from the master to arrive (which it
> never will).

This is nonsense --- the slave's kernel *will* eventually notice that
the TCP connection is dead, and tell walreceiver so. I don't doubt
that the standard TCP timeout is longer than people want to wait for
that, but claiming that it will never happen is simply wrong.

I think that enabling slave-side TCP keepalives and control of the
keepalive timeout parameters is probably sufficient for 9.0 here.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andres Freund on 19 Jun 2010 15:15

On Saturday 19 June 2010 18:05:34 Joshua D. Drake wrote:
> On Sat, 2010-06-19 at 09:43 -0400, Robert Haas wrote:
> > 4. Streaming Replication needs to detect death of master. We need
> > some sort of keep-alive, here. Whether it's at the TCP level (as
> > advocated by Tom Lane and others) or at the protocol level (as
> > advocated by Greg Stark) is something that we have yet to decide; once
> > it's decided, someone will need to do it...
>
> TCP involves unknowns, such as firewalls, vpn routers and ssh tunnels. I
> humbly suggest we *not* be pedantic and implement something practical
> and less prone to variables outside the control of Pg.
And has the huge advantage of being implementable in about 5 lines of C
(setsockopt + error checking). Considering what time in the release cycle this
is...

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2 3 4 5 6
Prev: About tapes
Next: [HACKERS] beta3 & the open items list