From: Greg Smith on
Fujii Masao wrote:
> Umm... what is your definition of "synchronous"? I'm planning to provide
> four synchronization modes as follows, for v8.5. Does this fit in your
> thought?
>
> The primary waits ... before returning "success" of a transaction;
> * nothing - asynchronous replication
> * recv ACK - semi-synchronous replication
> * fsync ACK - semi-synchronous replication
> * redo ACK - synchronous replication
>
> Or, in synchronous replication, we must wait a fsync and a redo ACK?
>
Right, those are the possibilities, all four of them have valid use
cases in the field and are worth implementing. I don't like the label
"semi-synchronous replication" myself, but it's a valuable feature to
implement, and that is unfortunately the term other parts of the
industry use for that approach.

But everyone needs to be extremely careful with the terminology here:
if you say "synchronous replication", that *only* means what you're
labeling "redo ACK" ("WAL ACK" really). "Synchronous replication"
should not be used as a group term that includes the semi-synchronous
variations, which are in fact asynchronous despite their marketing
name. If someone means semi-synchronous, but they say synchronous
thinking it's a shared term also applicable to the semi-synchronous
variations here, that's just going to be confusing for everyone.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on
On Fri, Nov 13, 2009 at 1:49 PM, Greg Smith <greg(a)2ndquadrant.com> wrote:
> Right, those are the possibilities, all four of them have valid use cases in
> the field and are worth implementing.  I don't like the label
> "semi-synchronous replication" myself, but it's a valuable feature to
> implement, and that is unfortunately the term other parts of the industry
> use for that approach.

BTW, MySQL and DRBD use the term "semi-synchronous":
http://forge.mysql.com/wiki/ReplicationFeatures/SemiSyncReplication
http://www.drbd.org/users-guide/s-replication-protocols.html

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on
Fujii Masao wrote:
> On Fri, Nov 13, 2009 at 1:49 PM, Greg Smith <greg(a)2ndquadrant.com> wrote:
>
>> Right, those are the possibilities, all four of them have valid use cases in
>> the field and are worth implementing. I don't like the label
>> "semi-synchronous replication" myself, but it's a valuable feature to
>> implement, and that is unfortunately the term other parts of the industry
>> use for that approach.
>>
>
> BTW, MySQL and DRBD use the term "semi-synchronous":
> http://forge.mysql.com/wiki/ReplicationFeatures/SemiSyncReplication
> http://www.drbd.org/users-guide/s-replication-protocols.html
>
Yeah, that's the "other parts of the industry" I was referring to.
MySQL uses "semi-synchronous" to distinguish between its completely
asynchronous default replication mode and one where it provides a
somewhat safer implementation. The description reads more as
"asynchronous with some synchronous elements", not "one style of
synchronous implementation". None of their documentation wanders into
the problem area here by calling it a true synchronous solution when
it's really not--MySQL Cluster is their synchronous vehicle.

It's fine to adopt the term "semi-synchronous", as it's become quite
popular and people are going to label the PG implementation with it
regardless of what is settled on here. But we should all try to be
careful to use it as correctly as possible.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on
On Fri, Nov 13, 2009 at 3:17 PM, Greg Smith <greg(a)2ndquadrant.com> wrote:
> Yeah, that's the "other parts of the industry" I was referring to.  MySQL
> uses "semi-synchronous" to distinguish between its completely asynchronous
> default replication mode and one where it provides a somewhat safer
> implementation.  The description reads more as "asynchronous with some
> synchronous elements", not "one style of synchronous implementation".  None
> of their documentation wanders into the problem area here by calling it a
> true synchronous solution when it's really not--MySQL Cluster is their
> synchronous vehicle.
> It's fine to adopt the term "semi-synchronous", as it's become quite popular
> and people are going to label the PG implementation with it regardless of
> what is settled on here.  But we should all try to be careful to use it as
> correctly as possible.

OK. Let's think over what "recv ACK" and "fsync ACK"
synchronization modes should be called later.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on
Markus Wanner wrote:
> You will definitely find different definitions and requirements of what
> synchronous replication means there.
To quote from the Wikipedia entry on "Database Replication" that Simon
pointed to during the earlier discussion,
http://en.wikipedia.org/wiki/Database_replication

"Synchronous replication - guarantees "zero data loss" by the means of
atomic write operation, i.e. write either completes on both sides or not
at all. Write is not considered complete until acknowledgement by both
local and remote storage."

That last part is the critical one: "acknowledgement by both local and
remote storage" is required before you can label something truly
synchronous replication. In implementation terms, that means you must
have both local and slave fsync calls finish to be considered truly
synchronous. That part is not ambiguous at all.

There's a definition of the weaker form in there too, which is where the
ambiguity is at:

"Semi-synchronous replication - this usually means that a write is
considered complete as soon as local storage acknowledges it and a
remote server acknowledges that it has received the write either into
memory or to a dedicated log file."

I don't consider that really synchronous replication anymore, but as you
say it's been strengthened by marketing enough to be a valid industry
term at this point. Since it's already gained traction we might use it,
as long as it's defined properly and its trade-offs vs. a true
synchronous implementation are documented.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers