Synchronous replication [PgSql]

Prev: standard_conforming_strings
Next: Per-column collation, proof of concept

From: Heikki Linnakangas on 16 Jul 2010 14:25

On 14/07/10 09:50, Fujii Masao wrote:
> TODO
> ----
> The patch have no features for performance improvement of synchronous
> replication. I admit that currently the performance overhead in the
> master is terrible. We need to address the following TODO items in the
> subsequent CF.
>
> * Change the poll loop in the walsender
> * Change the poll loop in the backend
> * Change the poll loop in the startup process
> * Change the poll loop in the walreceiver

I was actually hoping to see a patch for these things first, before any
of the synchronous replication stuff. Eliminating the polling loops is
important, latency will be laughable otherwise, and it will help the
synchronous case too.

> * Perform the WAL write and replication concurrently
> * Send WAL from not only disk but also WAL buffers

IMHO these are premature optimizations that we should not spend any
effort on now. Maybe later, if ever.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 17 Jul 2010 14:14

On 14/07/10 09:50, Fujii Masao wrote:
> Quorum commit
> -------------
> In previous discussion about synchronous replication, some people
> wanted the quorum commit feature. This feature is included in also
> Zontan's synchronous replication patch, so I decided to create it.
>
> The patch provides quorum parameter in postgresql.conf, which
> specifies how many standby servers transaction commit will wait for
> WAL records to be replicated to, before the command returns a
> "success" indication to the client. The default value is zero, which
> always doesn't make transaction commit wait for replication without
> regard to replication_mode. Also transaction commit always doesn't
> wait for replication to asynchronous standby (i.e., replication_mode
> is set to async) without regard to this parameter. If quorum is more
> than the number of synchronous standbys, transaction commit returns
> a "success" when the ACK has arrived from all of synchronous standbys.

There should be a way to specify "wait for *all* connected standby
servers to acknowledge"

> Protocol
> --------
> I extended the handshake message "START_REPLICATION" so that it
> includes replication_mode read from recovery.conf. If 'async' is
> passed, the master thinks that it doesn't need to wait for the ACK
> from the standby.

Please use self-explanatory names for the modes in START_REPLICATION
command, instead of just an integer.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on 21 Jul 2010 02:52

On Fri, Jul 16, 2010 at 7:43 PM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> On 16/07/10 10:40, Fujii Masao wrote:
>>
>> So we should always prevent the standby from applying any WAL in pg_xlog
>> unless walreceiver is in progress. That is, if there is no WAL available
>> in the archive, the standby ignores pg_xlog and starts walreceiver
>> process to request for WAL streaming.
>
> That completely defeats the purpose of storing streamed WAL in pg_xlog in
> the first place. The reason it's written and fsync'd to pg_xlog is that if
> the standby subsequently crashes, you can use the WAL from pg_xlog to
> reapply the WAL up to minRecoveryPoint. Otherwise you can't start up the
> standby anymore.

But, the standby can start up by reading the missing WAL files from the
master. No?

On the second thought, minRecoveryPoint can be guaranteed to be older
than the fsync location on the master if we'll prevent the standby from
applying the WAL files more than the fsync location. So we can safely
apply the WAL files in pg_xlog up to minRecoveryPoint.

Consequently, we should always prevent the standby from applying any
newer WAL in pg_xlog than minRecoveryPoint unless walreceiver is in
progress. Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on 21 Jul 2010 03:36

On Sat, Jul 17, 2010 at 3:25 AM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> On 14/07/10 09:50, Fujii Masao wrote:
>>
>> TODO
>> ----
>> The patch have no features for performance improvement of synchronous
>> replication. I admit that currently the performance overhead in the
>> master is terrible. We need to address the following TODO items in the
>> subsequent CF.
>>
>> * Change the poll loop in the walsender
>> * Change the poll loop in the backend
>> * Change the poll loop in the startup process
>> * Change the poll loop in the walreceiver
>
> I was actually hoping to see a patch for these things first, before any of
> the synchronous replication stuff. Eliminating the polling loops is
> important, latency will be laughable otherwise, and it will help the
> synchronous case too.

At first, note that the poll loop in the backend and walreceiver doesn't
exist without synchronous replication stuff.

Yeah, I'll start with the change of the poll loop in the walsender. I'm
thinking that we should make the backend signal the walsender to send the
outstanding WAL immediately as the previous synchronous replication patch
I submitted in the past year did. I use the signal here because walsender
needs to wait for the request from the backend and the ack message from
the standby *concurrently* in synchronous replication. If we use the
semaphore instead of the signal, the walsender would not be able to
respond the ack immediately, which also degrades the performance.

The problem of this idea is that signal can be sent per transaction commit.
I'm not sure if this frequent signaling really harms the performance of
replication. BTW, when I benchmarked the previous synchronous replication
patch based on the idea, AFAIR the result showed no impact of the
signaling. But... Thought? Do you have another better idea?

>> * Perform the WAL write and replication concurrently
>> * Send WAL from not only disk but also WAL buffers
>
> IMHO these are premature optimizations that we should not spend any effort
> on now. Maybe later, if ever.

Yep!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Aidan Van Dyk on 21 Jul 2010 08:52

* Fujii Masao <masao.fujii(a)gmail.com> [100721 03:49]:

> >> The patch provides quorum parameter in postgresql.conf, which
> >> specifies how many standby servers transaction commit will wait for
> >> WAL records to be replicated to, before the command returns a
> >> "success" indication to the client. The default value is zero, which
> >> always doesn't make transaction commit wait for replication without
> >> regard to replication_mode. Also transaction commit always doesn't
> >> wait for replication to asynchronous standby (i.e., replication_mode
> >> is set to async) without regard to this parameter. If quorum is more
> >> than the number of synchronous standbys, transaction commit returns
> >> a "success" when the ACK has arrived from all of synchronous standbys.
> >
> > There should be a way to specify "wait for *all* connected standby servers
> > to acknowledge"
>
> Agreed. I'll allow -1 as the valid value of the quorum parameter, which
> means that transaction commit waits for all connected standbys.

Hm... so if my 1 synchronouse standby is operatign normally, and quarum
is set to 1, I'll get what I want (commit waits until it's safely on both
servers). But what happens if my standby goes bad. Suddenly the quarum
setting is ignored (because it's > number of connected standby
servers?) Is there a way for me to not allow any commits if the quarum
setting number of standbies is *not* availble? Yes, I want my db to
"halt" in that situation, and yes, alarmbells will be ringing...

In reality, I'm likely to run 2 synchronous slaves, with quarum of 1.
So 1 slave can fail an dI can still have 2 going. But if that 2nd slave
ever failed while the other was down, I definately don't want the master
to forge on ahead!

Of course, this won't be for everyone, just as the current "just
connected standbys" isn't for everything either...

a.

--
Aidan Van Dyk Create like a god,
aidan(a)highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: standard_conforming_strings
Next: Per-column collation, proof of concept