From: Fujii Masao on
On Sat, Jun 12, 2010 at 12:15 AM, Stefan Kaltenbrunner
<stefan(a)kaltenbrunner.cc> wrote:
> hmm ok - but assuming sync rep we would end up with something like the
> following(hypotetically assuming each operation takes 1 time unit):
>
> originally:
>
> write 1
> sync 1
> network 1
> write 1
> sync 1
>
> total: 5
>
> whereas in the new case we would basically have the write+sync compete with
> network+write+sync in parallel(total 3 units) and we would only have to wait
> for the slower of those two sets of operations instead of the total time of
> both or am I missing something.

Yeah, this is what I'd like to say. Thanks!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Mon, Jun 14, 2010 at 4:14 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
> On Fri, Jun 11, 2010 at 11:24 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>> I think the failover case might be OK. �But if the master crashes and
>> restarts, the slave might be left thinking its xlog position is ahead
>> of the xlog position on the master.
>
> Right. Unless we perform a failover in this case, the standby might go down
> because of inconsistency of WAL after restarting the master. To avoid this
> problem, walsender must wait for WAL to be not only written but also *fsynced*
> on the master before sending it as 9.0 does. Though this would degrade the
> performance, this might be useful for some cases. We should provide the knob
> to specify whether to allow the standby to go ahead of the master or not?

Maybe. That sounds like a pretty enormous foot-gun to me, considering
that we have no way of recovering from the situation where the standby
gets ahead of the master. Right now, I believe we're still in the
situation where the standby goes into an infinite CPU-chewing,
log-spewing loop, but even after we fix that it's not going to be good
enough to really handle that case sensibly, which we probably need to
do if we want to make this change.

Come to think of it, can this happen already? Can the master stream
WAL to the standby after it's written but before it's fsync'd?

We should get the open item fixed for 9.0 here before we start
worrying about 9.1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Mon, 2010-06-14 at 17:39 +0900, Fujii Masao wrote:
> On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> > Stefan Kaltenbrunner <stefan(a)kaltenbrunner.cc> writes:
> >> hmm not sure that is what fujii tried to say - I think his point was
> >> that in the original case we would have serialized all the operations
> >> (first write+sync on the master, network afterwards and write+sync on
> >> the slave) and now we could try parallelizing by sending the wal before
> >> we have synced locally.
> >
> > Well, we're already not waiting for fsync, which is the slowest part.
>
> No, currently walsender waits for fsync.
>
> Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH,
> xlogctl->LogwrtResult.Write is updated after XLogWrite() performs fsync.
> As the result, walsender cannot send WAL not fsynced yet. We should
> update xlogctl->LogwrtResult.Write before XLogWrite() performs fsync
> for 9.0?
>
> But that change would cause the problem that Robert pointed out.
> http://archives.postgresql.org/pgsql-hackers/2010-06/msg00670.php

ISTM you just defined some clear objectives for next work.

Copying the data from WAL buffers is mostly irrelevant. The majority of
time is lost waiting for fsync. The biggest issue is about how to allow
WAL write and WALSender to act concurrently and have backend wait for
both.

Sure, copying data from wal_buffers will be faster still, but it will
cause you to address some subtle data structure locking operations that
we could solve at a later time. And it still gives the problem of how
the master resets itself if the standby really is ahead.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Mon, 2010-06-14 at 17:39 +0900, Fujii Masao wrote:
> No, currently walsender waits for fsync.
> ...

> But that change would cause the problem that Robert pointed out.
> http://archives.postgresql.org/pgsql-hackers/2010-06/msg00670.php

Presumably this means that if synchronous_commit = off on primary that
SR in 9.0 will no longer work correctly if the primary crashes?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on
On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> Maybe. �That sounds like a pretty enormous foot-gun to me, considering
> that we have no way of recovering from the situation where the standby
> gets ahead of the master.

No, we can do that by reconstructing the standby from the backup.

And, that situation is not a problem for users including me who prefer to
perform a failover when the master goes down. Of course, we can just restart
the master in that case, but it's likely to take longer than a failover
because there would be a cause of the crash. For example, if the master goes
down because of a media crash, the master would never start up unless PITR
is performed. So I'm not sure how many users prefer a restart to a failover.

> We should get the open item fixed for 9.0 here before we start
> worrying about 9.1.

Yep, so I was submitting some patches in these days :)

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers