From: Robert Haas on 14 Jul 2010 11:16 On Wed, Jul 14, 2010 at 2:50 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote: > The patch have no features for performance improvement of synchronous > replication. I admit that currently the performance overhead in the > master is terrible. We need to address the following TODO items in the > subsequent CF. > > * Change the poll loop in the walsender > * Change the poll loop in the backend > * Change the poll loop in the startup process > * Change the poll loop in the walreceiver > * Perform the WAL write and replication concurrently > * Send WAL from not only disk but also WAL buffers I have a feeling that if we don't have a design for these last two before we start committing things, we're possibly going to regret it later. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Fujii Masao on 16 Jul 2010 03:40 On Thu, Jul 15, 2010 at 12:16 AM, Robert Haas <robertmhaas(a)gmail.com> wrote: > On Wed, Jul 14, 2010 at 2:50 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote: >> The patch have no features for performance improvement of synchronous >> replication. I admit that currently the performance overhead in the >> master is terrible. We need to address the following TODO items in the >> subsequent CF. >> >> * Change the poll loop in the walsender >> * Change the poll loop in the backend >> * Change the poll loop in the startup process >> * Change the poll loop in the walreceiver >> * Perform the WAL write and replication concurrently >> * Send WAL from not only disk but also WAL buffers > > I have a feeling that if we don't have a design for these last two > before we start committing things, we're possibly going to regret it > later. Yeah, I'll give it a try. The problem is that the standby can apply the non-fsync'd WAL on the master. So if we allow walsender to send the non-fsync'd WAL, we should make walsender send also the current fsync location and prevent the standby from applying the newer WAL than the fsync location. New message type for sending the fsync location would be required in Streaming Replication Protocol. But sometimes it might go along with XLogData message. After the master crashes and walreceiver is terminated, currently the standby attempts to replay the WAL in the pg_xlog and the archive. Since WAL in the archive is guaranteed to have already been fsync'd by the master, it's not problem for the standby to apply that WAL. OTOH, WAL records in pg_xlog directory might not exist in the crashed master. So we should always prevent the standby from applying any WAL in pg_xlog unless walreceiver is in progress. That is, if there is no WAL available in the archive, the standby ignores pg_xlog and starts walreceiver process to request for WAL streaming. This idea is a little inefficient because the already-sent WAL might be sent again when the master is restarted. But since this ensures that the standby will not apply the non-fsync'd WAL on the master, it's quite safe. What about this idea? This idea doesn't conflict with the patch I submitted for CF 2010-07. So please feel free to review the patch :) But if you think that the patch is not reviewable until that idea has been implemented, I'll try to implement that ASAP. PS. Probably I cannot reply to the mail until July 21. Sorry. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 16 Jul 2010 06:43 On 16/07/10 10:40, Fujii Masao wrote: > So we should always prevent the standby from applying any WAL in pg_xlog > unless walreceiver is in progress. That is, if there is no WAL available > in the archive, the standby ignores pg_xlog and starts walreceiver > process to request for WAL streaming. That completely defeats the purpose of storing streamed WAL in pg_xlog in the first place. The reason it's written and fsync'd to pg_xlog is that if the standby subsequently crashes, you can use the WAL from pg_xlog to reapply the WAL up to minRecoveryPoint. Otherwise you can't start up the standby anymore. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Dimitri Fontaine on 16 Jul 2010 13:26 Le 16 juil. 2010 à 12:43, Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> a écrit : > On 16/07/10 10:40, Fujii Masao wrote: >> So we should always prevent the standby from applying any WAL in pg_xlog >> unless walreceiver is in progress. That is, if there is no WAL available >> in the archive, the standby ignores pg_xlog and starts walreceiver >> process to request for WAL streaming. > > That completely defeats the purpose of storing streamed WAL in pg_xlog in the first place. The reason it's written and fsync'd to pg_xlog is that if the standby subsequently crashes, you can use the WAL from pg_xlog to reapply the WAL up to minRecoveryPoint. Otherwise you can't start up the standby anymore. I guess we know for sure that this point has been fsync()ed on the Master, or that we could arrange it so that we know that?
From: Heikki Linnakangas on 16 Jul 2010 14:22
On 16/07/10 20:26, Dimitri Fontaine wrote: > Le 16 juil. 2010 à 12:43, Heikki Linnakangas<heikki.linnakangas(a)enterprisedb.com> a écrit : > >> On 16/07/10 10:40, Fujii Masao wrote: >>> So we should always prevent the standby from applying any WAL in pg_xlog >>> unless walreceiver is in progress. That is, if there is no WAL available >>> in the archive, the standby ignores pg_xlog and starts walreceiver >>> process to request for WAL streaming. >> >> That completely defeats the purpose of storing streamed WAL in pg_xlog in the first place. The reason it's written and fsync'd to pg_xlog is that if the standby subsequently crashes, you can use the WAL from pg_xlog to reapply the WAL up to minRecoveryPoint. Otherwise you can't start up the standby anymore. > > I guess we know for sure that this point has been fsync()ed on the Master, or that we could arrange it so that we know that? At the moment we only stream WAL that's already been fsync()ed on the master, so we don't have this problem, but Fujii is proposing to change that. I think that's a premature optimization, and we should not try to change that. There is no evidence from field (granted, streaming replication is a new feature) or from performance tests that it is a problem in practice, or that sending WAL earlier would help. Let's concentrate on the bare minimum required to make synchronous replication work. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |