Synchronous replication [PgSql]

Prev: standard_conforming_strings
Next: Per-column collation, proof of concept

From: Robert Haas on 14 Jul 2010 11:16

On Wed, Jul 14, 2010 at 2:50 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
> The patch have no features for performance improvement of synchronous
> replication. I admit that currently the performance overhead in the
> master is terrible. We need to address the following TODO items in the
> subsequent CF.
>
> * Change the poll loop in the walsender
> * Change the poll loop in the backend
> * Change the poll loop in the startup process
> * Change the poll loop in the walreceiver
> * Perform the WAL write and replication concurrently
> * Send WAL from not only disk but also WAL buffers

I have a feeling that if we don't have a design for these last two
before we start committing things, we're possibly going to regret it
later.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on 16 Jul 2010 03:40

On Thu, Jul 15, 2010 at 12:16 AM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> On Wed, Jul 14, 2010 at 2:50 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
>> The patch have no features for performance improvement of synchronous
>> replication. I admit that currently the performance overhead in the
>> master is terrible. We need to address the following TODO items in the
>> subsequent CF.
>>
>> * Change the poll loop in the walsender
>> * Change the poll loop in the backend
>> * Change the poll loop in the startup process
>> * Change the poll loop in the walreceiver
>> * Perform the WAL write and replication concurrently
>> * Send WAL from not only disk but also WAL buffers
>
> I have a feeling that if we don't have a design for these last two
> before we start committing things, we're possibly going to regret it
> later.

Yeah, I'll give it a try.

The problem is that the standby can apply the non-fsync'd WAL on the
master. So if we allow walsender to send the non-fsync'd WAL, we should
make walsender send also the current fsync location and prevent the
standby from applying the newer WAL than the fsync location.

New message type for sending the fsync location would be required in
Streaming Replication Protocol. But sometimes it might go along with
XLogData message.

After the master crashes and walreceiver is terminated, currently the
standby attempts to replay the WAL in the pg_xlog and the archive.
Since WAL in the archive is guaranteed to have already been fsync'd by
the master, it's not problem for the standby to apply that WAL. OTOH,
WAL records in pg_xlog directory might not exist in the crashed master.
So we should always prevent the standby from applying any WAL in pg_xlog
unless walreceiver is in progress. That is, if there is no WAL available
in the archive, the standby ignores pg_xlog and starts walreceiver
process to request for WAL streaming.

This idea is a little inefficient because the already-sent WAL might
be sent again when the master is restarted. But since this ensures
that the standby will not apply the non-fsync'd WAL on the master,
it's quite safe.

What about this idea?

This idea doesn't conflict with the patch I submitted for CF 2010-07.
So please feel free to review the patch :) But if you think that the
patch is not reviewable until that idea has been implemented, I'll
try to implement that ASAP.

PS. Probably I cannot reply to the mail until July 21. Sorry.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 16 Jul 2010 06:43

On 16/07/10 10:40, Fujii Masao wrote:
> So we should always prevent the standby from applying any WAL in pg_xlog
> unless walreceiver is in progress. That is, if there is no WAL available
> in the archive, the standby ignores pg_xlog and starts walreceiver
> process to request for WAL streaming.

That completely defeats the purpose of storing streamed WAL in pg_xlog
in the first place. The reason it's written and fsync'd to pg_xlog is
that if the standby subsequently crashes, you can use the WAL from
pg_xlog to reapply the WAL up to minRecoveryPoint. Otherwise you can't
start up the standby anymore.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on 16 Jul 2010 13:26

Le 16 juil. 2010 Ã 12:43, Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> a Ã©crit :

> On 16/07/10 10:40, Fujii Masao wrote:
>> So we should always prevent the standby from applying any WAL in pg_xlog
>> unless walreceiver is in progress. That is, if there is no WAL available
>> in the archive, the standby ignores pg_xlog and starts walreceiver
>> process to request for WAL streaming.
>
> That completely defeats the purpose of storing streamed WAL in pg_xlog in the first place. The reason it's written and fsync'd to pg_xlog is that if the standby subsequently crashes, you can use the WAL from pg_xlog to reapply the WAL up to minRecoveryPoint. Otherwise you can't start up the standby anymore.

I guess we know for sure that this point has been fsync()ed on the Master, or that we could arrange it so that we know that?

From: Heikki Linnakangas on 16 Jul 2010 14:22

On 16/07/10 20:26, Dimitri Fontaine wrote:
> Le 16 juil. 2010 à 12:43, Heikki Linnakangas<heikki.linnakangas(a)enterprisedb.com> a écrit :
>
>> On 16/07/10 10:40, Fujii Masao wrote:
>>> So we should always prevent the standby from applying any WAL in pg_xlog
>>> unless walreceiver is in progress. That is, if there is no WAL available
>>> in the archive, the standby ignores pg_xlog and starts walreceiver
>>> process to request for WAL streaming.
>>
>> That completely defeats the purpose of storing streamed WAL in pg_xlog in the first place. The reason it's written and fsync'd to pg_xlog is that if the standby subsequently crashes, you can use the WAL from pg_xlog to reapply the WAL up to minRecoveryPoint. Otherwise you can't start up the standby anymore.
>
> I guess we know for sure that this point has been fsync()ed on the Master, or that we could arrange it so that we know that?

At the moment we only stream WAL that's already been fsync()ed on the
master, so we don't have this problem, but Fujii is proposing to change
that.

I think that's a premature optimization, and we should not try to change
that. There is no evidence from field (granted, streaming replication is
a new feature) or from performance tests that it is a problem in
practice, or that sending WAL earlier would help. Let's concentrate on
the bare minimum required to make synchronous replication work.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: standard_conforming_strings
Next: Per-column collation, proof of concept