From: Robert Haas on 16 Jun 2010 16:15 On Wed, Jun 16, 2010 at 4:00 PM, Kevin Grittner <Kevin.Grittner(a)wicourts.gov> wrote: > Robert Haas <robertmhaas(a)gmail.com> wrote: >> So, obviously at this point my slave database is corrupted beyond >> repair due to nothing more than an unexpected crash on the master. > > Certainly that's true for resuming replication. �From your > description it sounds as though the slave would be usable for > purposes of taking over for an unrecoverable master. �Or am I > misunderstanding? It depends on what you mean. If you can prevent the slave from ever reconnecting to the master, then it's still safe to promote it. But if the master comes up and starts generating WAL again, and the slave ever sees any of that WAL (either via SR or via the archive) then you're toast. In my case, the slave was irrecoverably out of sync with the master as soon as the crash happened, but it still could have been promoted at that point if you killed the old master. It became corrupted as soon as it replayed the first WAL record starting beyond 1/87000000. At that point it's potentially got arbitrary corruption; you need a new base backup (but this may not be immediately obvious; it may look OK even if it isn't). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 16 Jun 2010 16:26 On Wed, Jun 16, 2010 at 4:14 PM, Josh Berkus <josh(a)agliodbs.com> wrote: >> The first problem I noticed is that the slave never seems to realize >> that the master has gone away. �Every time I crashed the master, I had >> to kill the wal receiver process on the slave to get it to reconnect; >> otherwise it just sat there waiting, either forever or at least for >> longer than I was willing to wait. > > Yes, I've noticed this. �That was the reason for forcing walreceiver to > shut down on a restart per prior discussion and patches. �This needs to > be on the open items list ... possibly it'll be fixed by Simon's > keepalive patch? �Or is it just a tcp_keeplalive issue? I think a TCP keepalive might be enough, but I have not tried to code or test it. >> More seriously, I was able to demonstrate that the problem linked in >> the thread above is real: if the master crashes after streaming WAL >> that it hasn't yet fsync'd, then on recovery the slave's xlog position >> is ahead of the master. �So far I've only been able to reproduce this >> with fsync=off, but I believe it's possible anyway, > > ... and some users will turn fsync off. �This is, in fact, one of the > primary uses for streaming replication: Durability via replicas. Yep. >> and this just >> makes it more likely. �After the most recent crash, the master thought >> pg_current_xlog_location() was 1/86CD4000; the slave thought >> pg_last_xlog_receive_location() was 1/8733C000. �After reconnecting to >> the master, the slave then thought that >> pg_last_xlog_receive_location() was 1/87000000. > > So, *in this case*, detecting out-of-sequence xlogs (and PANICing) would > have actually prevented the slave from being corrupted. > > My question, though, is detecting out-of-sequence xlogs *enough*? �Are > there any crash conditions on the master which would cause the master to > reuse the same locations for different records, for example? �I don't > think so, but I'd like to be certain. The real problem here is that we're sending records to the slave which might cease to exist on the master if it unexpectedly reboots. I believe that what we need to do is make sure that the master only sends WAL it has already fsync'd (Tom suggested on another thread that this might be necessary, and I think it's now clear that it is 100% necessary). But I'm not sure how this will play with fsync=off - if we never fsync, then we can't ever really send any WAL without risking this failure mode. Similarly with synchronous_commit=off, I believe that the next checkpoint will still fsync WAL, but the lag might be long. I think we should also change the slave to panic and shut down immediately if its xlog position is ahead of the master. That can never be a watertight solution because you can always advance the xlog position on them master and mask the problem. But I think we should do it anyway, so that we at least have a chance of noticing that we're hosed. I wish I could think of something a little more watertight... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 16 Jun 2010 16:30 Robert Haas <robertmhaas(a)gmail.com> wrote: > Kevin Grittner <Kevin.Grittner(a)wicourts.gov> wrote: >> Robert Haas <robertmhaas(a)gmail.com> wrote: >>> So, obviously at this point my slave database is corrupted >>> beyond repair due to nothing more than an unexpected crash on >>> the master. >> >> Certainly that's true for resuming replication. From your >> description it sounds as though the slave would be usable for >> purposes of taking over for an unrecoverable master. Or am I >> misunderstanding? > > It depends on what you mean. If you can prevent the slave from > ever reconnecting to the master, then it's still safe to promote > it. Yeah, that's what I meant. > But if the master comes up and starts generating WAL again, and > the slave ever sees any of that WAL (either via SR or via the > archive) then you're toast. Well, if it *applies* what it sees, yes. Effectively you've got transactions from two alternative timelines applied in the same database, which is not going to work. At a minimum we need some way to reliably detect that the incoming WAL stream is starting before some applied WAL record and isn't a match. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Magnus Hagander on 16 Jun 2010 16:32 On Wed, Jun 16, 2010 at 22:26, Robert Haas <robertmhaas(a)gmail.com> wrote: >>> and this just >>> makes it more likely. �After the most recent crash, the master thought >>> pg_current_xlog_location() was 1/86CD4000; the slave thought >>> pg_last_xlog_receive_location() was 1/8733C000. �After reconnecting to >>> the master, the slave then thought that >>> pg_last_xlog_receive_location() was 1/87000000. >> >> So, *in this case*, detecting out-of-sequence xlogs (and PANICing) would >> have actually prevented the slave from being corrupted. >> >> My question, though, is detecting out-of-sequence xlogs *enough*? �Are >> there any crash conditions on the master which would cause the master to >> reuse the same locations for different records, for example? �I don't >> think so, but I'd like to be certain. > > The real problem here is that we're sending records to the slave which > might cease to exist on the master if it unexpectedly reboots. �I > believe that what we need to do is make sure that the master only > sends WAL it has already fsync'd (Tom suggested on another thread that > this might be necessary, and I think it's now clear that it is 100% > necessary). �But I'm not sure how this will play with fsync=off - if > we never fsync, then we can't ever really send any WAL without risking Well, at this point we can just prevent streaming replication with fsync=off if we can't think of an easy fix, and then design a "proper fix" for 9.1. Given how late we are in the cycle. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Rafael Martinez on 16 Jun 2010 16:38 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Robert Haas wrote: > > The first problem I noticed is that the slave never seems to realize > that the master has gone away. Every time I crashed the master, I had > to kill the wal receiver process on the slave to get it to reconnect; > otherwise it just sat there waiting, either forever or at least for > longer than I was willing to wait. > Hei Robert I have seen two different behaviors in my tests. a) If I crash the server , the wal receiver process will wait forever and the only way to get it working again is to restart postgres in the slave after the master is back online. I have not been able to get the slave database corrupted (I am running with fsync=on). b) If I kill all postgres processes in the master with kill -9, the wal receiver will start trying to reconnect automatically and it will success in the moment postgres gets startet in the master. The only different I can see at the OS level is that in a) the connection continues to have the status ESTABLISHED forever, and in b) it gets status TIME_WAIT in the moment postgres is down in the master. regards, - -- Rafael Martinez, <r.m.guerrero(a)usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwZNiMACgkQBhuKQurGihQ3CQCaAhKcLkur6MO0/F7RqD6OWbv2 R/IAnjj4SrgiwkD6qKodJxrFHCODAEuh =qHlh -----END PGP SIGNATURE----- -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: streaming replication breaks horribly if mastercrashes Next: ANNOUNCE list |