Prev: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers
Next: [HACKERS] pg_upgrade output directory
From: Greg Stark on 1 Jul 2010 07:27 On Wed, Jun 30, 2010 at 12:37 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: > One thought that occurred to me is that if the master and standby were > more tightly coupled, you could recover after a crash by making the > one with the further-advanced WAL position the master, and the other > one the standby. �That would get around this problem, though at the > cost of considerable additional complexity. �But then if one of the > servers comes up and can't talk to the other, you need some mechanism > for preventing split-brain syndrome. Users should be free to build infrastructure to allow that. But we can't just switch ourselves -- we don't know what other pieces of their systems need to be updated when the master changes. We also need to stop thinking in terms of one master and one slave. They could have dozens of slaves and in case of failover would want to pick the slave with the most recent WAL position. The way I picture that happening they're monitoring all their slaves in some monitoring tool and use that data to pick the new master. Some external tool picks the new master and tells that host, all the other slaves, and all the rest of the their infrastructure where to find the new master and does whatever is necessary to restart or reload configurations. The question I think is what interfaces do we need in Postgres to make this easy. The monitoring tool needs a way to find the current WAL position from the slaves even when the master is down. That means potentially needing to start up the slaves in read-only mode with no master at all. It also means making it easy for an external tool to switch a node from slave to primary and change a slave's master. And it also means a slave should be able to change master and pick up where it left off easily. I'm not sure what the recommended interfaces for these operations would be currently for an external tool. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 6 Jul 2010 19:44 On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote: > In 9.0, walsender reads WAL always from the disk and sends it to the standby. > That is, we cannot send WAL until it has been written (and flushed) to the disk. > This degrades the performance of synchronous replication very much since a > transaction commit must wait for the WAL write time *plus* the replication time. > > The attached patch enables walsender to read data from WAL buffers in addition > to the disk. Since we can write and send WAL simultaneously, in synchronous > replication, a transaction commit has only to wait for either of them. So the > performance would significantly increase. To recap the previous discussion on this thread, we ended up changing the behavior of 9.0 so that it only sends WAL which has been written to the OS *and flushed*, because sending unflushed WAL to the standby is unsafe. The standby can get ahead of the master while still believing that the databases are in sync, due to the fact that after an SR reconnect we rewind to the start of the current WAL segment. This results in a silently corrupt standby database. If it's unsafe to send written but unflushed WAL to the standby, then for the same reasons we can't send unwritten WAL either. Therefore, I believe that this entire patch in its current form is a nonstarter and we should mark it Rejected in the CF app so that reviewers don't unnecessarily spend time on it. Having said that, I do think we urgently need some high-level design discussion on how sync rep is actually going to handle this issue (perhaps on a new thread). If we can't resolve this issue, sync rep is going to be really slow; but there are no easy solutions to this problem in sight, so if we want to have sync rep for 9.1 we'd better agree on one of the difficult solutions soon so that work can begin. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Dimitri Fontaine on 7 Jul 2010 04:40 Robert Haas <robertmhaas(a)gmail.com> writes: > If it's unsafe to send written but unflushed WAL to the standby, then > for the same reasons we can't send unwritten WAL either. [...] > Having said that, I do think we urgently need some high-level design > discussion on how sync rep is actually going to handle this issue Stop me if I'm all wrong already, but I though we said that we should handle this case by decoupling what we can send to the standby and what it can apply. We could do this by sending the current WAL fsync'ed position on the master in the WAL sender protocol, either in the WAL itself or as out-of-bound messages, I guess. Now, this can be made safe, how to make it fast (low-latency) is yet to be addressed. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 7 Jul 2010 06:57 On Wed, Jul 7, 2010 at 4:40 AM, Dimitri Fontaine <dfontaine(a)hi-media.com> wrote: > Stop me if I'm all wrong already, but I though we said that we should > handle this case by decoupling what we can send to the standby and what > it can apply. We could do this by sending the current WAL fsync'ed > position on the master in the WAL sender protocol, either in the WAL > itself or as out-of-bound messages, I guess. > > Now, this can be made safe, how to make it fast (low-latency) is yet to > be addressed. Yeah, that's the trick, isn't it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Dimitri Fontaine on 7 Jul 2010 10:20
Tom Lane <tgl(a)sss.pgh.pa.us> writes: > Dimitri Fontaine <dfontaine(a)hi-media.com> writes: >> Stop me if I'm all wrong already, but I though we said that we should >> handle this case by decoupling what we can send to the standby and what >> it can apply. > > What's the point of that? It won't make the standby apply any faster. True, but it allows to send the WAL content before to ack its fsync. Regards. -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |