From: Jan Wieck on
On 5/26/2010 1:17 PM, Heikki Linnakangas wrote:
> Could you generate the commit-order log by simply registering a commit
> hook (RegisterXactCallback(XACT_EVENT_COMMIT)) that writes such a log
> somewhere in the data directory? That would work with older versions
> too, no server changes required.
>

That would work, as it seems that the backend keeps holding on to its
locks until after calling the callbacks.

> It would not get called during recovery, but I believe that would be
> sufficient for Slony. You could always batch commits that you don't know
> when they committed as if they committed simultaneously.

Here you are mistaken. If the origin crashes but can recover not yet
flushed to xlog-commit-order transactions, then the consumer has no idea
about the order of those commits, which throws us back to the point
where we require a non cacheable global sequence to replay the
individual actions of those "now batched" transactions in an agreeable
order.

The commit order data needs to be covered by crash recovery.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on
On 26/05/10 21:43, Jan Wieck wrote:
> On 5/26/2010 1:17 PM, Heikki Linnakangas wrote:
>> It would not get called during recovery, but I believe that would be
>> sufficient for Slony. You could always batch commits that you don't
>> know when they committed as if they committed simultaneously.
>
> Here you are mistaken. If the origin crashes but can recover not yet
> flushed to xlog-commit-order transactions, then the consumer has no idea
> about the order of those commits, which throws us back to the point
> where we require a non cacheable global sequence to replay the
> individual actions of those "now batched" transactions in an agreeable
> order.
>
> The commit order data needs to be covered by crash recovery.

Perhaps I'm missing something, but I thought that Slony currently uses a
heartbeat, and all transactions committed between two beats are banged
together and committed as one in the slave so that their relative commit
order doesn't matter. Can we not do the same for commits missing from
the commit-order log?

I'm thinking that the commit-order log would contain two kinds of records:

a) Transaction with XID X committed
b) All transactions with XID < X committed

During normal operation we write the 1st kind of record at every commit.
After crash recovery (perhaps at the first commit after recovery or when
the slon daemon first polls the server, as there's no hook for
end-of-recovery), we write the 2nd kind of record.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on
Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes:
> Perhaps I'm missing something, but I thought that Slony currently uses a
> heartbeat, and all transactions committed between two beats are banged
> together and committed as one in the slave so that their relative commit
> order doesn't matter.

I guess Slony does the same as pgq here: all events of all those
transactions between two given ticks are batched together in the order
of the event commits. (In fact the batches are made at the consumer
request, so possibly spreading more than 2 ticks at a time).

If you skip that event ordering (within transactions), you can't
maintain foreign keys on the slaves, among other things.

The idea of this proposal is to be able to get this commit order
directly from where the information is maintained, rather than use some
sort of user sequence for that.

So even ordering the txid and txid_snapshots with respect to WAL commit
time (LSN) won't be the whole story, for any given transaction
containing more than one event we also need to have them in order. I
know Jan didn't forget about it so it must either be in the proposal or
easily derived, too tired to recheck.

Regards,
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Wed, May 26, 2010 at 4:11 PM, Dimitri Fontaine
<dfontaine(a)hi-media.com> wrote:
> Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes:
>> Perhaps I'm missing something, but I thought that Slony currently uses a
>> heartbeat, and all transactions committed between two beats are banged
>> together and committed as one in the slave so that their relative commit
>> order doesn't matter.
>
> I guess Slony does the same as pgq here: all events of all those
> transactions between two given ticks are batched together in the order
> of the event commits. (In fact the batches are made at the consumer
> request, so possibly spreading more than 2 ticks at a time).
>
> If you skip that event ordering (within transactions), you can't
> maintain foreign keys on the slaves, among other things.
>
> The idea of this proposal is to be able to get this commit order
> directly from where the information is maintained, rather than use some
> sort of user sequence for that.

Exactly.

> So even ordering the txid and txid_snapshots with respect to WAL commit
> time (LSN) won't be the whole story, for any given transaction
> containing more than one event we also need to have them in order. I
> know Jan didn't forget about it so it must either be in the proposal or
> easily derived, too tired to recheck.

Right, so the point is - with this proposal, he can switch to using a
LOCAL sequence number to order events within the session and then
order the sessions using the commit ordering. Right now, he has to
use a GLOBAL sequence number because there's no way to know the commit
order.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on
Jan Wieck <JanWieck(a)Yahoo.com> wrote:

> Without this logic, the replication system could not combine
> multiple origin sessions into one replication session without
> risking to never find a state, in which it can commit.

My latest idea for handling this in WAL-based replication involves
WAL-logging information about the transaction through which a the
committing transaction makes it safe to view. There are a few
options here at the detail level that I'm still thinking through.
The idea would be that the xmin from read-only queries on the slaves
might be somewhere behind where you would expect based on
transactions committed. (The details involve such things as where
non-serializable transactions fall into the plan on both sides, and
whether it's worth the effort to special-case read-only transactions
on the master.)

I can't say that I'm 100% sure that some lurking detail won't shoot
this technique down for HS, but it seems good to me at a conceptual
level.

I think, however, that this fails to work for systems like Slony and
Londiste because there could be transactions writing to tables which
are not replication targets, so the snapshot adjustments wouldn't be
safe. True? (If not true, I think that adding some sort of xmin
value, depending on the answers to the above questions, to Jan's
proposed structure might support better transactional integrity,
even to the level of full serializable support, at the cost of
delaying visibility of committed data.)

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers