From: Bruce Momjian on
Simon Riggs wrote:
>
> OK, here is the latest version of the Hot Standby patchset. This is
> about version 30+ by now, but we should regard this as 0.2.1
> Patch against CVS HEAD (now): clean apply, compile, no known bugs.
>
> OVERVIEW

Anyone who is interested in how the hot standby behaves should read the
following excellent PDF Simon produced. It goes into great detail of
the slave's read-only transactions and how the standby behaves during
continuous slave recovery:

> You can download PDF versions of the fine manual is here
> http://wiki.postgresql.org/images/0/01/Hot_Standby_main.pdf

The function call docs are at:

> http://wiki.postgresql.org/images/1/10/Hot_Standby_Recovery_Functions.pdf

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on
Looking at the way cache invalidations are handled in two-phase
transactions, it would be simpler if we store the shared cache
invalidation messages in the twophase state file header, like we store
deleted relations and subxids. This allows them to be copied to the
COMMIT_PREPARED WAL record, so that we don't need treat twophase commits
differently than regular ones in xact_redo_commit. As the patch stands,
the new
xactGetCommittedInvalidationMessages/MakeSharedInvalidMessagesArray
mechanism is duplicated functionality with
AtPrepare_Inval/-PersistInvalidationMessage - both materialize the
pending shared invalidation messages so that they can be written to
disk. I did that in my git branch.

I wonder if we might have alignment issues with the
SharedInvalidationMessages being stored in WAL records, following the
subxids. All the data stored in that record have 4-byte alignment at the
moment, but if SharedInvalidationMessage ever needs 8-byte alignment, we
would have trouble. Probably not worth changing code, it's highly
unlikely that SharedInvalidationMessage will ever need 8-byte alignment,
but perhaps a comment somewhere would be in order.

I note that we don't emit RunningXacts after a shutdown checkpoint. So
if recovery starts at a shutdown checkpoint, we don't let read-only
backends in until the first online checkpoint. Could we treat a shutdown
checkpoint as a snapshot with no transactions running? Or do prepared
transactions screw that up?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on
The logic in the lock manager to track the number of held
AccessExclusiveLocks (with ProcArrayIncrementNumHeldLocks and
ProcArrayDecrementNumHeldLocks) seems to be broken. I added an Assertion
into ProcArrayDecrementNumHeldLocks:

--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1401,6 +1401,7 @@ ProcArrayIncrementNumHeldLocks(PGPROC *proc)
void
ProcArrayDecrementNumHeldLocks(PGPROC *proc)
{
+ Assert(proc->numHeldLocks > 0);
proc->numHeldLocks--;
}

This tripped the assertion:

postgres=# CREATE TABLE foo (id int4 primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"foo_pkey" for table "foo"
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Making matters worse, the primary server refuses to startup up after
that, tripping the assertion again in crash recovery:

$ bin/postmaster -D data
LOG: database system was interrupted while in recovery at 2009-09-23
11:56:15 EEST
HINT: This probably means that some data is corrupted and you will have
to use the last backup for recovery.
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: redo starts at 0/32000070
LOG: REDO @ 0/32000070; LSN 0/320000AC: prev 0/32000020; xid 0; len 32:
Heap2 - clean: rel 1663/11562/1249; blk 32 remxid 4352
LOG: consistent recovery state reached
LOG: REDO @ 0/320000AC; LSN 0/320000CC: prev 0/32000070; xid 0; len 4:
XLOG - nextOid: 24600
LOG: REDO @ 0/320000CC; LSN 0/320000F4: prev 0/320000AC; xid 0; len 12:
Storage - file create: base/11562/16408
LOG: REDO @ 0/320000F4; LSN 0/3200011C: prev 0/320000CC; xid 4364; len
12: Relation - exclusive relation lock: xid 4364 db 11562 rel 16408
LOG: REDO @ 0/3200011C; LSN 0/320001D8: prev 0/320000F4; xid 4364; len
159: Heap - insert: rel 1663/11562/1259; tid 5/4
....
LOG: REDO @ 0/32004754; LSN 0/32004878: prev 0/320046A8; xid 4364; len
264: Transaction - commit: 2009-09-23 11:55:51.888398+03; 15 inval
msgs:catcache id38 catcache id37 catcache id38 catcache id37 catcache
id38 catcache id37 catcache id7 catcache id6 catcache id26 smgr relcache
smgr relcache smgr relcache
TRAP: FailedAssertion("!(proc->numHeldLocks > 0)", File: "procarray.c",
Line: 1404)
LOG: startup process (PID 27430) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure

I'm sure that's just a simple bug somewhere, but it highlights that we
need be careful to avoid putting any extra work into the normal recovery
path. Otherwise bugs in hot standby related code can cause crash
recovery to fail.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on

On Wed, 2009-09-23 at 11:13 +0300, Heikki Linnakangas wrote:

> Looking at the way cache invalidations are handled in two-phase
> transactions, it would be simpler if we store the shared cache
> invalidation messages in the twophase state file header, like we store
> deleted relations and subxids. This allows them to be copied to the
> COMMIT_PREPARED WAL record, so that we don't need treat twophase commits
> differently than regular ones in xact_redo_commit. As the patch stands,
> the new
> xactGetCommittedInvalidationMessages/MakeSharedInvalidMessagesArray
> mechanism is duplicated functionality with
> AtPrepare_Inval/-PersistInvalidationMessage - both materialize the
> pending shared invalidation messages so that they can be written to
> disk. I did that in my git branch.

We could, but the prepared transaction path already contains special
case code anyway, so we aren't reducing number of test cases required.
This looks like a possible area for refactoring, but I don't see the
need for pre-factoring. i.e. don't rework before commit, rework after.

> I wonder if we might have alignment issues with the
> SharedInvalidationMessages being stored in WAL records, following the
> subxids. All the data stored in that record have 4-byte alignment at the
> moment, but if SharedInvalidationMessage ever needs 8-byte alignment, we
> would have trouble. Probably not worth changing code, it's highly
> unlikely that SharedInvalidationMessage will ever need 8-byte alignment,
> but perhaps a comment somewhere would be in order.

It's a possible source of bugs, but there are no issues there AFAICS.
The technique of multiple arrays on a WAL record wasn't invented by this
patch.

> I note that we don't emit RunningXacts after a shutdown checkpoint. So
> if recovery starts at a shutdown checkpoint, we don't let read-only
> backends in until the first online checkpoint. Could we treat a shutdown
> checkpoint as a snapshot with no transactions running? Or do prepared
> transactions screw that up?

We could, but I see no requirement for starting HS from a backup taken
on a shutdown database. It's just another special case to test and since
we already have significant number of important test cases I'd say add
this later.


That seems to have reflected all of your points on this post, though
thanks for the comments. I'm keen to reduce complexity in areas that
have caused bugs, but for stuff that is working I am tempted to leave
alone on such a big patch. Anything we can do to avoid re-testing
sections of code and/or reduce the number of tests required is going to
increase stability.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on

On Wed, 2009-09-23 at 12:07 +0300, Heikki Linnakangas wrote:
> it highlights that we
> need be careful to avoid putting any extra work into the normal
> recovery
> path. Otherwise bugs in hot standby related code can cause crash
> recovery to fail.

Excellent point. I will put in additional protective code there.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers