From: Tom Lane on
Robert Haas <robertmhaas(a)gmail.com> writes:
> The first problem I noticed is that the slave never seems to realize
> that the master has gone away. Every time I crashed the master, I had
> to kill the wal receiver process on the slave to get it to reconnect;
> otherwise it just sat there waiting, either forever or at least for
> longer than I was willing to wait.

TCP timeout is the answer there.

> More seriously, I was able to demonstrate that the problem linked in
> the thread above is real: if the master crashes after streaming WAL
> that it hasn't yet fsync'd, then on recovery the slave's xlog position
> is ahead of the master.

So indeed we'd better change walsender to not get ahead of the fsync'd
position. And probably also warn people to not disable fsync on the
master, unless they're willing to write it off and fail over at any
system crash.

> I don't know what to do about this, but I'm pretty sure we can't ship it as-is.

Doesn't seem tremendously insoluble from here ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on
On 6/16/10 1:26 PM, Robert Haas wrote:
> Similarly with synchronous_commit=off, I believe
> that the next checkpoint will still fsync WAL, but the lag might be
> long.

That's not a showstopper. Just tell people that having synch_commit=off
on the master might increase the lag to the slave, and leave it alone.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Pierre C" on

> The real problem here is that we're sending records to the slave which
> might cease to exist on the master if it unexpectedly reboots. I
> believe that what we need to do is make sure that the master only
> sends WAL it has already fsync'd

How about this :

- pg records somewhere the xlog position of the last record synced to
disk. I dont remember the variable name, let's just say xlog_synced_recptr
- pg always writes the xlog first, ie. before writing any page it checks
that the page's xlog recptr < xlog_synced_recptr and if it's not the case
it has to wait before it can write the page.

Now :

- master sends messages to slave with the xlog_synced_recptr after each
fsync
- slave gets these messages and records the master_xlog_synced_recptr
- slave doesn't write any page to disk until BOTH the slave's local WAL
copy AND the master's WAL have reached the recptr of this page

If a master crashes or the slave loses connection, then the in-memory
pages of the slave could be in a state that is "in the future" compared to
the master's state when it comes up.

Therefore when a slave detects that the master has crashed, it could shoot
itself and recover from WAL, at which point the slave will not be "in the
future" anymore from the master, rather it would be in the past, which is
a lot less problematic...

Of course this wouldn't speed up the failover process !...

> I think we should also change the slave to panic and shut down
> immediately if its xlog position is ahead of the master. That can
> never be a watertight solution because you can always advance the xlog
> position on them master and mask the problem. But I think we should
> do it anyway, so that we at least have a chance of noticing that we're
> hosed. I wish I could think of something a little more watertight...

If a slave is "in the future" relative to the master, then the only way to
keep using this slave could be to make it the new master...


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on
On Wed, Jun 16, 2010 at 9:56 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas(a)gmail.com> writes:
>> The first problem I noticed is that the slave never seems to realize
>> that the master has gone away. �Every time I crashed the master, I had
>> to kill the wal receiver process on the slave to get it to reconnect;
>> otherwise it just sat there waiting, either forever or at least for
>> longer than I was willing to wait.
>
> TCP timeout is the answer there.

If you mean TCP Keepalives, I disagree quite strongly. If you want the
application to guarantee any particular timing constraints then you
have to implement that in the application using timers and data
packets. TCP keepalives are for detecting broken network connections,
not enforcing application rules. Using TCP timeouts would have a
number of problems: On many systems they are impossible or difficult
to adjust and worse, it would make it impossible to distinguish an
postgres master crash from a transient or permanent network outage.


>> More seriously, I was able to demonstrate that the problem linked in
>> the thread above is real: if the master crashes after streaming WAL
>> that it hasn't yet fsync'd, then on recovery the slave's xlog position
>> is ahead of the master.
>
> So indeed we'd better change walsender to not get ahead of the fsync'd
> position. �And probably also warn people to not disable fsync on the
> master, unless they're willing to write it off and fail over at any
> system crash.
>
>> I don't know what to do about this, but I'm pretty sure we can't ship it as-is.
>
> Doesn't seem tremendously insoluble from here ...

For the case of fsync=off I can't get terribly excited about the slave
being ahead of the master after a crash. After all the master is toast
anyways. It seems to me in this situation the slave should detect that
the master has failed and automatically come up in master mode. Or
perhaps it should just shut down and then refuse to come up as a slave
again on the basis that it would be unsafe precisely because it might
be ahead of the (corrupt) master. At some point we should consider
having a server set to fsync=off refuse to come back up unless it was
shut down cleanly anyways. Perhaps we should put a strongly worded
warning now.

For the case of fsync=on it does seem to me to be terribly obvious
that the master should never send records to the slave that aren't
fsynced on the master. For 9.1 the other option proposed would work as
well but would be more complex -- to send and store records
immediately but not replay them on the slave until they're either
fsynced on the master or failover occurs.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on
Greg Stark <gsstark(a)mit.edu> wrote:

> TCP keepalives are for detecting broken network connections

Yeah. That seems like what we have here. If you shoot the OS in
the head, the network connection is broken rather abruptly, without
the normal packets exchanged to close the TCP connection. It sounds
like it behaves just fine except for not detecting a broken
connection.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers