From: Robert Haas on
On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
> Thought? Comment? Objection?

What happens if the WAL is streamed to the standby and then the master
crashes without writing that WAL to disk?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on
On Fri, Jun 11, 2010 at 10:22 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
>> Thought? Comment? Objection?
>
> What happens if the WAL is streamed to the standby and then the master
> crashes without writing that WAL to disk?

What are you concerned about?

I think that the situation would be the same as 9.0 from users' perspective.
After failover, the transaction which a client regards as aborted (because
of the crash) might be visible or invisible on new master (i.e., original
standby). For now, we cannot control that.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Fri, Jun 11, 2010 at 9:57 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
> On Fri, Jun 11, 2010 at 10:22 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>> On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
>>> Thought? Comment? Objection?
>>
>> What happens if the WAL is streamed to the standby and then the master
>> crashes without writing that WAL to disk?
>
> What are you concerned about?
>
> I think that the situation would be the same as 9.0 from users' perspective.
> After failover, the transaction which a client regards as aborted (because
> of the crash) might be visible or invisible on new master (i.e., original
> standby). For now, we cannot control that.

I think the failover case might be OK. But if the master crashes and
restarts, the slave might be left thinking its xlog position is ahead
of the xlog position on the master.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Fujii Masao <masao.fujii(a)gmail.com> writes:
> In 9.0, walsender reads WAL always from the disk and sends it to the standby.
> That is, we cannot send WAL until it has been written (and flushed) to the disk.

I believe the above statement to be incorrect: walsender does *not* wait
for an fsync to occur.

I agree with the idea of trying to read from WAL buffers instead of the
file system, but the main reason why is that the current behavior makes
FADVISE_DONTNEED for WAL pretty dubious. It'd be a good idea to still
(artificially) limit replication to not read ahead of the written-out
data.

> ... Since we can write and send WAL simultaneously, in synchronous
> replication, a transaction commit has only to wait for either of them. So the
> performance would significantly increase.

That performance claim, frankly, is ludicrous. There is no way that
round trip network delay plus write+fsync on the slave is faster than
local write+fsync. Furthermore, I would say that you are thinking
exactly backwards about the requirements for synchronous replication:
what that would mean is that transaction commit waits for *both*,
not whichever one finishes first.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Stefan Kaltenbrunner on
On 06/11/2010 04:31 PM, Tom Lane wrote:
> Fujii Masao<masao.fujii(a)gmail.com> writes:
>> In 9.0, walsender reads WAL always from the disk and sends it to the standby.
>> That is, we cannot send WAL until it has been written (and flushed) to the disk.
>
> I believe the above statement to be incorrect: walsender does *not* wait
> for an fsync to occur.
>
> I agree with the idea of trying to read from WAL buffers instead of the
> file system, but the main reason why is that the current behavior makes
> FADVISE_DONTNEED for WAL pretty dubious. It'd be a good idea to still
> (artificially) limit replication to not read ahead of the written-out
> data.
>
>> ... Since we can write and send WAL simultaneously, in synchronous
>> replication, a transaction commit has only to wait for either of them. So the
>> performance would significantly increase.
>
> That performance claim, frankly, is ludicrous. There is no way that
> round trip network delay plus write+fsync on the slave is faster than
> local write+fsync. Furthermore, I would say that you are thinking
> exactly backwards about the requirements for synchronous replication:
> what that would mean is that transaction commit waits for *both*,
> not whichever one finishes first.

hmm not sure that is what fujii tried to say - I think his point was
that in the original case we would have serialized all the operations
(first write+sync on the master, network afterwards and write+sync on
the slave) and now we could try parallelizing by sending the wal before
we have synced locally.



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers