Prev: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers
Next: [HACKERS] pg_upgrade output directory
From: Robert Haas on 11 Jun 2010 09:22 On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote: > Thought? Comment? Objection? What happens if the WAL is streamed to the standby and then the master crashes without writing that WAL to disk? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Fujii Masao on 11 Jun 2010 09:57 On Fri, Jun 11, 2010 at 10:22 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: > On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote: >> Thought? Comment? Objection? > > What happens if the WAL is streamed to the standby and then the master > crashes without writing that WAL to disk? What are you concerned about? I think that the situation would be the same as 9.0 from users' perspective. After failover, the transaction which a client regards as aborted (because of the crash) might be visible or invisible on new master (i.e., original standby). For now, we cannot control that. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 11 Jun 2010 10:24 On Fri, Jun 11, 2010 at 9:57 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote: > On Fri, Jun 11, 2010 at 10:22 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: >> On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote: >>> Thought? Comment? Objection? >> >> What happens if the WAL is streamed to the standby and then the master >> crashes without writing that WAL to disk? > > What are you concerned about? > > I think that the situation would be the same as 9.0 from users' perspective. > After failover, the transaction which a client regards as aborted (because > of the crash) might be visible or invisible on new master (i.e., original > standby). For now, we cannot control that. I think the failover case might be OK. But if the master crashes and restarts, the slave might be left thinking its xlog position is ahead of the xlog position on the master. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 11 Jun 2010 10:31 Fujii Masao <masao.fujii(a)gmail.com> writes: > In 9.0, walsender reads WAL always from the disk and sends it to the standby. > That is, we cannot send WAL until it has been written (and flushed) to the disk. I believe the above statement to be incorrect: walsender does *not* wait for an fsync to occur. I agree with the idea of trying to read from WAL buffers instead of the file system, but the main reason why is that the current behavior makes FADVISE_DONTNEED for WAL pretty dubious. It'd be a good idea to still (artificially) limit replication to not read ahead of the written-out data. > ... Since we can write and send WAL simultaneously, in synchronous > replication, a transaction commit has only to wait for either of them. So the > performance would significantly increase. That performance claim, frankly, is ludicrous. There is no way that round trip network delay plus write+fsync on the slave is faster than local write+fsync. Furthermore, I would say that you are thinking exactly backwards about the requirements for synchronous replication: what that would mean is that transaction commit waits for *both*, not whichever one finishes first. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Stefan Kaltenbrunner on 11 Jun 2010 10:38
On 06/11/2010 04:31 PM, Tom Lane wrote: > Fujii Masao<masao.fujii(a)gmail.com> writes: >> In 9.0, walsender reads WAL always from the disk and sends it to the standby. >> That is, we cannot send WAL until it has been written (and flushed) to the disk. > > I believe the above statement to be incorrect: walsender does *not* wait > for an fsync to occur. > > I agree with the idea of trying to read from WAL buffers instead of the > file system, but the main reason why is that the current behavior makes > FADVISE_DONTNEED for WAL pretty dubious. It'd be a good idea to still > (artificially) limit replication to not read ahead of the written-out > data. > >> ... Since we can write and send WAL simultaneously, in synchronous >> replication, a transaction commit has only to wait for either of them. So the >> performance would significantly increase. > > That performance claim, frankly, is ludicrous. There is no way that > round trip network delay plus write+fsync on the slave is faster than > local write+fsync. Furthermore, I would say that you are thinking > exactly backwards about the requirements for synchronous replication: > what that would mean is that transaction commit waits for *both*, > not whichever one finishes first. hmm not sure that is what fujii tried to say - I think his point was that in the original case we would have serialized all the operations (first write+sync on the master, network afterwards and write+sync on the slave) and now we could try parallelizing by sending the wal before we have synced locally. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |