Prev: 9.0 beta2 pg_upgrade: malloc 0 bytes patch
Next: streaming replication breaks horribly if mastercrashes
From: Robert Haas on 16 Jun 2010 15:47 On Mon, Jun 14, 2010 at 7:55 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: >> But that change would cause the problem that Robert pointed out. >> http://archives.postgresql.org/pgsql-hackers/2010-06/msg00670.php > > Presumably this means that if synchronous_commit = off on primary that > SR in 9.0 will no longer work correctly if the primary crashes? I spent some time investigating this today and have come to the conclusion that streaming replication is really, really broken in the face of potential crashes on the master. Using a copy of VMware parallels provided by $EMPLOYER, I set up two Fedora 12 virtual machines on my MacBook in a master/slave configuration. Then I crashed the master repeatedly using 'echo b > /proc/sysrq-trigger', which causes an immediate reboot (without syncing the disks, closing network connections, etc.) while running pgbench or other stuff against it. The first problem I noticed is that the slave never seems to realize that the master has gone away. Every time I crashed the master, I had to kill the wal receiver process on the slave to get it to reconnect; otherwise it just sat there waiting, either forever or at least for longer than I was willing to wait. More seriously, I was able to demonstrate that the problem linked in the thread above is real: if the master crashes after streaming WAL that it hasn't yet fsync'd, then on recovery the slave's xlog position is ahead of the master. So far I've only been able to reproduce this with fsync=off, but I believe it's possible anyway, and this just makes it more likely. After the most recent crash, the master thought pg_current_xlog_location() was 1/86CD4000; the slave thought pg_last_xlog_receive_location() was 1/8733C000. After reconnecting to the master, the slave then thought that pg_last_xlog_receive_location() was 1/87000000. The slave didn't think this was a problem yet, though. When I then restarted a pgbench run against the master, the slave pretty quickly started spewing an endless stream of messages complaining of "LOG: invalid record length at 1/8733A828". So, obviously at this point my slave database is corrupted beyond repair due to nothing more than an unexpected crash on the master. That's bad. What is worse is that the system only detected the corruption because the slave had crossed an xlog segment boundary which the master had not crossed. Had it been otherwise, when the slave rewound to the beginning of the current segment, it would have had no trouble getting back in sync with the master - but it would have done this after having replayed WAL that, from the master's point of view, doesn't exist. In other words, the database on the slave would be silently corrupted. I don't know what to do about this, but I'm pretty sure we can't ship it as-is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |