Proposal for 9.1: WAL streaming from WAL buffers [PgSql]

Prev: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers
Next: [HACKERS] pg_upgrade output directory

From: Greg Stark on 1 Jul 2010 07:27

On Wed, Jun 30, 2010 at 12:37 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> One thought that occurred to me is that if the master and standby were
> more tightly coupled, you could recover after a crash by making the
> one with the further-advanced WAL position the master, and the other
> one the standby. �That would get around this problem, though at the
> cost of considerable additional complexity. �But then if one of the
> servers comes up and can't talk to the other, you need some mechanism
> for preventing split-brain syndrome.

Users should be free to build infrastructure to allow that. But we
can't just switch ourselves -- we don't know what other pieces of
their systems need to be updated when the master changes.

We also need to stop thinking in terms of one master and one slave.
They could have dozens of slaves and in case of failover would want to
pick the slave with the most recent WAL position. The way I picture
that happening they're monitoring all their slaves in some monitoring
tool and use that data to pick the new master. Some external tool
picks the new master and tells that host, all the other slaves, and
all the rest of the their infrastructure where to find the new master
and does whatever is necessary to restart or reload configurations.

The question I think is what interfaces do we need in Postgres to make
this easy. The monitoring tool needs a way to find the current WAL
position from the slaves even when the master is down. That means
potentially needing to start up the slaves in read-only mode with no
master at all. It also means making it easy for an external tool to
switch a node from slave to primary and change a slave's master. And
it also means a slave should be able to change master and pick up
where it left off easily. I'm not sure what the recommended interfaces
for these operations would be currently for an external tool.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 6 Jul 2010 19:44

On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
> In 9.0, walsender reads WAL always from the disk and sends it to the standby.
> That is, we cannot send WAL until it has been written (and flushed) to the disk.
> This degrades the performance of synchronous replication very much since a
> transaction commit must wait for the WAL write time *plus* the replication time.
>
> The attached patch enables walsender to read data from WAL buffers in addition
> to the disk. Since we can write and send WAL simultaneously, in synchronous
> replication, a transaction commit has only to wait for either of them. So the
> performance would significantly increase.

To recap the previous discussion on this thread, we ended up changing
the behavior of 9.0 so that it only sends WAL which has been written
to the OS *and flushed*, because sending unflushed WAL to the standby
is unsafe. The standby can get ahead of the master while still
believing that the databases are in sync, due to the fact that after
an SR reconnect we rewind to the start of the current WAL segment.
This results in a silently corrupt standby database.

If it's unsafe to send written but unflushed WAL to the standby, then
for the same reasons we can't send unwritten WAL either. Therefore, I
believe that this entire patch in its current form is a nonstarter and
we should mark it Rejected in the CF app so that reviewers don't
unnecessarily spend time on it.

Having said that, I do think we urgently need some high-level design
discussion on how sync rep is actually going to handle this issue
(perhaps on a new thread). If we can't resolve this issue, sync rep
is going to be really slow; but there are no easy solutions to this
problem in sight, so if we want to have sync rep for 9.1 we'd better
agree on one of the difficult solutions soon so that work can begin.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on 7 Jul 2010 04:40

Robert Haas <robertmhaas(a)gmail.com> writes:
> If it's unsafe to send written but unflushed WAL to the standby, then
> for the same reasons we can't send unwritten WAL either.
[...]
> Having said that, I do think we urgently need some high-level design
> discussion on how sync rep is actually going to handle this issue

Stop me if I'm all wrong already, but I though we said that we should
handle this case by decoupling what we can send to the standby and what
it can apply. We could do this by sending the current WAL fsync'ed
position on the master in the WAL sender protocol, either in the WAL
itself or as out-of-bound messages, I guess.

Now, this can be made safe, how to make it fast (low-latency) is yet to
be addressed.

Regards,
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 7 Jul 2010 06:57

On Wed, Jul 7, 2010 at 4:40 AM, Dimitri Fontaine <dfontaine(a)hi-media.com> wrote:
> Stop me if I'm all wrong already, but I though we said that we should
> handle this case by decoupling what we can send to the standby and what
> it can apply. We could do this by sending the current WAL fsync'ed
> position on the master in the WAL sender protocol, either in the WAL
> itself or as out-of-bound messages, I guess.
>
> Now, this can be made safe, how to make it fast (low-latency) is yet to
> be addressed.

Yeah, that's the trick, isn't it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on 7 Jul 2010 10:20

Tom Lane <tgl(a)sss.pgh.pa.us> writes:
> Dimitri Fontaine <dfontaine(a)hi-media.com> writes:
>> Stop me if I'm all wrong already, but I though we said that we should
>> handle this case by decoupling what we can send to the standby and what
>> it can apply.
>
> What's the point of that? It won't make the standby apply any faster.

True, but it allows to send the WAL content before to ack its fsync.

Regards.
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers
Next: [HACKERS] pg_upgrade output directory