From: Heikki Linnakangas on
Simon Riggs wrote:
> On Tue, 2010-05-04 at 13:23 -0400, Tom Lane wrote:
>
>> * LogStandbySnapshot is merest fantasy: no guarantee that either the
>> XIDs list or the locks list will be consistent with the point in WAL
>> where it will get inserted. What's worse, locking things down enough
>> to guarantee consistency would be horrid for performance, or maybe
>> even deadlock-inducing. Could lose both ways: list might contain an
>> XID whose commit/abort went to WAL before the snapshot did, or list
>> might be missing an XID started just after snap was taken, The latter
>> case could possibly be dealt with via nextXid filtering, but that
>> doesn't fix the former case, and anyway we have both ends of the same
>> problem for locks.
>
> This was the only serious complaint on your list, so lets address it.
>
> Clearly we don't want to lock everything down, for all the reasons you
> say. That creates a gap between when data is derived and when data
> logged to WAL.

Right. This was discussed first in August:
http://archives.postgresql.org/message-id/4A8CE561.4000302(a)enterprisedb.com.

I concur that the idea is that we deal at replay with the fact that the
snapshot lags behind. At replay, any locks/XIDs in the snapshot that
have already been committed/aborted are ignored. For any locks/XIDs
taken just after the snapshot was taken, the replay will see the other
WAL records with that information.

We need to add comments explaining all that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Wed, 2010-05-05 at 09:12 +0300, Heikki Linnakangas wrote:

> I concur that the idea is that we deal at replay with the fact that the
> snapshot lags behind. At replay, any locks/XIDs in the snapshot that
> have already been committed/aborted are ignored. For any locks/XIDs
> taken just after the snapshot was taken, the replay will see the other
> WAL records with that information.
>
> We need to add comments explaining all that.

The attached comments are proposed.

Reviewing this information again to propose a fix for the two minor
other bugs pointed out by Tom show that they are both related and need
one combined fix that would work like this:

Currently we handle the state STANDBY_INITIALIZED incorrectly. We need
to run RecordKnownAssignedXids() during this mode, so that we both
extend the clog and record known xids. That means that two other callers
of RecordKnownAssignedXids also need to call it at that time.

In ProcArrayApplyRecoveryInfo() we run KnownAssignedXidsAdd(), though
this will fail if there are existing xids in there, now it is sorted. So
we need to: run KnownAssignedXidsRemovePreceding(latestObservedXid) to
remove extraneous xids, then extract any xids that remain and add them
to the ones arriving with the running xacts record. We then need to sort
the combined array and re-insert into KnownAssignedXids.

Previously, I had imagined that the gap between the logical checkpoint
and the physical checkpoint was small. With spread checkpoints this
isn't the case any longer. So I propose adding a special WAL record that
is inserted during LogStandbySnapshot() immediately before
GetRunningTransactionLocks(), so that we minimise the time window
between deriving snapshot data and recording it in WAL.

Those changes are not especially invasive.

--
Simon Riggs www.2ndQuadrant.com