From: Fujii Masao on
On Wed, Jul 21, 2010 at 9:52 PM, Aidan Van Dyk <aidan(a)highrise.ca> wrote:
> * Fujii Masao <masao.fujii(a)gmail.com> [100721 03:49]:
>
>> >> The patch provides quorum parameter in postgresql.conf, which
>> >> specifies how many standby servers transaction commit will wait for
>> >> WAL records to be replicated to, before the command returns a
>> >> "success" indication to the client. The default value is zero, which
>> >> always doesn't make transaction commit wait for replication without
>> >> regard to replication_mode. Also transaction commit always doesn't
>> >> wait for replication to asynchronous standby (i.e., replication_mode
>> >> is set to async) without regard to this parameter. If quorum is more
>> >> than the number of synchronous standbys, transaction commit returns
>> >> a "success" when the ACK has arrived from all of synchronous standbys.
>> >
>> > There should be a way to specify "wait for *all* connected standby servers
>> > to acknowledge"
>>
>> Agreed. I'll allow -1 as the valid value of the quorum parameter, which
>> means that transaction commit waits for all connected standbys.
>
> Hm... so if my 1 synchronouse standby is operatign normally, and quarum
> is set to 1, I'll get what I want (commit waits until it's safely on both
> servers). �But what happens if my standby goes bad. �Suddenly the quarum
> setting is ignored (because it's > number of connected standby
> servers?) �Is there a way for me to not allow any commits if the quarum
> setting number of standbies is *not* availble? �Yes, I want my db to
> "halt" in that situation, and yes, alarmbells will be ringing...
>
> In reality, I'm likely to run 2 synchronous slaves, with quarum of 1.
> So 1 slave can fail an dI can still have 2 going. �But if that 2nd slave
> ever failed while the other was down, I definately don't want the master
> to forge on ahead!
>
> Of course, this won't be for everyone, just as the current "just
> connected standbys" isn't for everything either...

Yeah, we need to clear up the detailed design of quorum commit feature,
and reach consensus on that.

How should the synchronous replication behave when the number of connected
standby servers is less than quorum?

1. Ignore quorum. The current patch adopts this. If the ACKs from all
connected standbys have arrived, transaction commit is successful
even if the number of standbys is less than quorum. If there is no
connected standby, transaction commit always is successful without
regard to quorum.

2. Observe quorum. Aidan wants this. Until the number of connected
standbys has become more than or equal to quorum, transaction commit
waits.

Which is the right behavior of quorum commit? Or we should add new
parameter specifying the behavior of quorum commit?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Yeb Havinga on
Fujii Masao wrote:
> How should the synchronous replication behave when the number of connected
> standby servers is less than quorum?
>
> 1. Ignore quorum. The current patch adopts this. If the ACKs from all
> connected standbys have arrived, transaction commit is successful
> even if the number of standbys is less than quorum. If there is no
> connected standby, transaction commit always is successful without
> regard to quorum.
>
> 2. Observe quorum. Aidan wants this. Until the number of connected
> standbys has become more than or equal to quorum, transaction commit
> waits.
>
> Which is the right behavior of quorum commit? Or we should add new
> parameter specifying the behavior of quorum commit?
>
Initially I also expected the quorum to behave like described by
Aidan/option 2. Also, IMHO the name "quorom" is a bit short, like having
"maximum" but not saying a max_something.

quorum_min_sync_standbys
quorum_max_sync_standbys

The question remains what are the sync standbys? Does it mean not-async?
Intuitively by looking at the enumeration of replication_mode I'd think
that the sync standbys are all standby's that operate in a not async
mode. That would be clearer with a boolean sync (or not) and for sync
standbys the replication_mode specified.

regards,
Yeb Havinga




--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on
On Thu, Jul 22, 2010 at 5:37 PM, Yeb Havinga <yebhavinga(a)gmail.com> wrote:
> Fujii Masao wrote:
>>
>> How should the synchronous replication behave when the number of connected
>> standby servers is less than quorum?
>>
>> 1. Ignore quorum. The current patch adopts this. If the ACKs from all
>> � connected standbys have arrived, transaction commit is successful
>> � even if the number of standbys is less than quorum. If there is no
>> � connected standby, transaction commit always is successful without
>> � regard to quorum.
>>
>> 2. Observe quorum. Aidan wants this. Until the number of connected
>> � standbys has become more than or equal to quorum, transaction commit
>> � waits.
>>
>> Which is the right behavior of quorum commit? Or we should add new
>> parameter specifying the behavior of quorum commit?
>>
>
> Initially I also expected the quorum to behave like described by
> Aidan/option 2.

OK. But some people (including me) would like to prevent the master
from halting when the standby fails, so I think that 1. also should
be supported. So I'm inclined to add new parameter specifying the
behavior of quorum commit when the number of synchronous standbys
becomes less than quorum.

> Also, IMHO the name "quorom" is a bit short, like having
> "maximum" but not saying a max_something.
>
> quorum_min_sync_standbys
> quorum_max_sync_standbys

What about quorum_standbys?

> The question remains what are the sync standbys? Does it mean not-async?

It's the standby which sets replication_mode to "recv", "fsync", or "replay".

> Intuitively by looking at the enumeration of replication_mode I'd think that
> the sync standbys are all standby's that operate in a not async mode. That
> would be clearer with a boolean sync (or not) and for sync standbys the
> replication_mode specified.

You mean that something like synchronous_replication as the recovery.conf
parameter should be added in addition to replication_mode? Since increasing
the number of similar parameters would confuse users, I don't like do that.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Yeb Havinga on
Fujii Masao wrote:
>> Intuitively by looking at the enumeration of replication_mode I'd think that
>> the sync standbys are all standby's that operate in a not async mode. That
>> would be clearer with a boolean sync (or not) and for sync standbys the
>> replication_mode specified.
>>
>
> You mean that something like synchronous_replication as the recovery.conf
> parameter should be added in addition to replication_mode? Since increasing
> the number of similar parameters would confuse users, I don't like do that.
>
I think what would be confusing if there is a mismatch between
implemented concepts and parameters.

1 does the master wait for standby servers on commit?
2 how many acknowledgements must the master receive before it can continue?
3 is a standby server a synchronous one, i.e. does it acknowledge a commit?
4 when do standby servers acknowledge a commit?
5 does it only wait when the standby's are connected, or also when they
are not connected?
6..?

When trying to match parameter names for the concepts above:
1 - does not exist, but can be answered with quorum_standbys = 0
2 - quorum_standbys
3 - yes, if replication_mode != async (here is were I thought I had to
think to much)
4 - replication modes recv, fsync and replay bot not async
5 - Zoltan's strict_sync_replication parameter

Just an idea, what about
for 4: acknowledge_commit = {no|recv|fsync|replay}
then 3 = yes, if acknowledge_commit != no

regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on
On Mon, Jul 26, 2010 at 5:27 PM, Yeb Havinga <yebhavinga(a)gmail.com> wrote:
> Fujii Masao wrote:
>>>
>>> Intuitively by looking at the enumeration of replication_mode I'd think
>>> that
>>> the sync standbys are all standby's that operate in a not async mode.
>>> That
>>> would be clearer with a boolean sync (or not) and for sync standbys the
>>> replication_mode specified.
>>>
>>
>> You mean that something like synchronous_replication as the recovery.conf
>> parameter should be added in addition to replication_mode? Since
>> increasing
>> the number of similar parameters would confuse users, I don't like do
>> that.
>>
>
> I think what would be confusing if there is a mismatch between implemented
> concepts and parameters.
>
> 1 does the master wait for standby servers on commit?
> 2 how many acknowledgements must the master receive before it can continue?
> 3 is a standby server a synchronous one, i.e. does it acknowledge a commit?
> 4 when do standby servers acknowledge a commit?
> 5 does it only wait when the standby's are connected, or also when they are
> not connected?
> 6..?
>
> When trying to match parameter names for the concepts above:
> 1 - does not exist, but can be answered with quorum_standbys = 0
> 2 - quorum_standbys
> 3 - yes, if replication_mode != async (here is were I thought I had to think
> to much)
> 4 - replication modes recv, fsync and replay bot not async
> 5 - Zoltan's strict_sync_replication parameter
>
> Just an idea, what about
> for 4: acknowledge_commit = {no|recv|fsync|replay}
> then 3 = yes, if acknowledge_commit != no

Thanks for the clarification.

I still like

replication_mode = {async|recv|fsync|replay}

rather than

synchronous_replication = {on|off}
acknowledge_commit = {no|recv|fsync|replay}

because the former is more intuitive for me and I don't want
to increase the number of parameters.

We need to hear from some users in this respect. If most want
the latter, of course, I'd love to adopt it.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers