Hot Standby 0.2.1 [PgSql]

Prev: PGCluster-II Progress
Next: Unicode UTF-8 table formatting for psql text output

From: Heikki Linnakangas on 25 Sep 2009 04:49

Looking at the startup sequence now.

I see that you modified ExtendSUBTRANS so that it doesn't wipe out
previously set values if it's called with out-of-order xids. I guess
that works, although I think it can leave pages unzeroed if it's called
with a large enough gap between xids, so that the previous one was on
e.g page 10 and the next one on page 12. Page 11 would not be zeroed,
AFAICS. Not sure if such big gaps in the xid sequence are possible, but
seems so if you have a very large max_connections setting and a lot of
subtransactions.

Nevertheless, I'd feel better if we kept ExtendSUBTRANS unmodified, and
instead arranged things so that ExtendSUBTRANS is always called in
xid-order. We can call it from RecordKnownAssignedTransactionIds, which
handles the out-of-order problem for other purposes already.

I think we have similar problem with clog. ExtendCLOG is currently not
called during recovery, so isn't it possible for the problem with
subtransaction commits that's described in the comments in StartupCLOG
to arise during hot standby? Again, I think we should call ExtendCLOG()
in RecordKnownAssignedTransactionIds. RecordKnownAssignedTransactionIds
is the hot standby version of GetNewTransactionId(), so to speak.

If we go with that, I think we'll need to call StartupSUBTRANS/CLOG
earlier in the startup sequence too, when we establish the startup
snapshot and let backends in.

Thoughts?

Other stuff:

- I removed the new DBConsistentUpToLSN() function and moved its
functionality into XLogNeedsFlush(). Since XLogFlush() updates
minRecoveryPoint instead of flushing WAL during recovery, it makes sense
for XLogNeedsFlush() to correspondingly check if minRecoveryPoint needs
to be updated when in recovery. That's a better definition for the call
in bufmgr.c too.

- I went ahead with the changes with RecoveryInfoLock and tracking the
number of held AccessExclusive locks in lmgr.c instead of proc array.

Can we find a better name for "loggable locks"? It means locks that need
to be WAL logged when acquired, for hot standby, and "loggable lock"
doesn't sound right for that. "Loggable" implies that it can be logged,
but it really means that it *must* be logged.

Keep an eye on my git repository for latest changes.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 25 Sep 2009 05:56

On Thu, 2009-09-24 at 13:33 +0300, Heikki Linnakangas wrote:
> Heikki Linnakangas wrote:
> > The problem becomes a lot easier if we accept that it's OK to have a
> > lock included in the running-xacts snapshot and also appear in a
> > XLOG_RELATION_LOCK record later. The standby should handle that
> > gracefully already. If we just remove RecoveryInfoLock, that can happen,
> > but it still won't be possible for a lock to be missed out which is what
> > we really care about.
>
> I see the problem with that now. Without the lock, it's possible that
> the XLOG_RELATION_LOCK WAL record is written before the
> XLOG_RUNNING_XACTS record. If the lock is not included in the snapshot,
> we want the lock WAL record to be after the snapshot record.
>
> So i guess we'll need the RecoveryInfoLock. But we don't need to hold it
> across the wait. I think it's enough to acquire it just before writing
> the WAL record in LockAcquire. That will ensure that the WAL record
> isn't written until the snapshot is completely finished.

I will think some more on that. I remember thinking there was a reason
why that wasn't enough, possibly to do with no-wait locks which I
remember caused me to change that code.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 25 Sep 2009 05:58

On Wed, 2009-09-23 at 19:07 +0300, Heikki Linnakangas wrote:

> Rather than keep the numHeldLocks counters per-proc in proc array, I
> think it would be simpler to have a single (or one per lock partition)
> counter in shared memory in lock.c. It's just an optimization to make it
> faster to find out that there is no loggable AccessExclusiveLocks in the
> system, so it really rather belongs into the lock manager.

What lock would protect that value? The whole purpose is to avoid taking
the LockMgrLocks and to give something that is accessible by the locks
already held by GetRunningTransactionData().

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 25 Sep 2009 06:04

On Wed, 2009-09-23 at 17:45 +0300, Heikki Linnakangas wrote:
> Heikki Linnakangas wrote:
> > Simon Riggs wrote:
> >> On Wed, 2009-09-23 at 11:13 +0300, Heikki Linnakangas wrote:
> >>> I note that we don't emit RunningXacts after a shutdown checkpoint. So
> >>> if recovery starts at a shutdown checkpoint, we don't let read-only
> >>> backends in until the first online checkpoint. Could we treat a shutdown
> >>> checkpoint as a snapshot with no transactions running? Or do prepared
> >>> transactions screw that up?
> >> We could, but I see no requirement for starting HS from a backup taken
> >> on a shutdown database. It's just another special case to test and since
> >> we already have significant number of important test cases I'd say add
> >> this later.
> >
> > There's also a related issue that if a backend holding
> > AccessExclusiveLock crashes without writing an abort WAL record, the
> > lock is never released in the standby. We handle the expiration of xids
> > at replay of running-xacts records, but AFAICS we don't do that for locks.
>
> Ah, scratch that, I now see that we do call
> XactClearRecoveryTransactions() when we see a shutdown checkpoint, which
> clears all recovery locks. But doesn't that prematurely clear all locks
> belonging to prepared transactions as well?

Much better to read your second post(s). :-)

Yes, you have found a(nother) issue. This was the first one that gave me
pause to think of the answer. The locks currently aren't tracked as to
whether they are 2PC or not, so we would need to store that info also so
that we can selectively release locks later.

Question: is it possible to do a fast shutdown when we have a prepared
transaction? Would it be better to take a different approach there for
prepared transactions? It seems strange to write a shutdown checkpoint
when the system isn't yet "clean".

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 25 Sep 2009 06:14

Simon Riggs wrote:
> On Wed, 2009-09-23 at 19:07 +0300, Heikki Linnakangas wrote:
>
>> Rather than keep the numHeldLocks counters per-proc in proc array, I
>> think it would be simpler to have a single (or one per lock partition)
>> counter in shared memory in lock.c. It's just an optimization to make it
>> faster to find out that there is no loggable AccessExclusiveLocks in the
>> system, so it really rather belongs into the lock manager.
>
> What lock would protect that value? The whole purpose is to avoid taking
> the LockMgrLocks and to give something that is accessible by the locks
> already held by GetRunningTransactionData().

The lock partition lock (so we really need one counter per partition, a
single counter would need additional locking). We're already holding
that in LockAcquire/LockRelease when we need to increment/decrement the
counter.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Prev: PGCluster-II Progress
Next: Unicode UTF-8 table formatting for psql text output