From: Bruce Momjian on 22 Sep 2009 13:06 Simon Riggs wrote: > > OK, here is the latest version of the Hot Standby patchset. This is > about version 30+ by now, but we should regard this as 0.2.1 > Patch against CVS HEAD (now): clean apply, compile, no known bugs. > > OVERVIEW Anyone who is interested in how the hot standby behaves should read the following excellent PDF Simon produced. It goes into great detail of the slave's read-only transactions and how the standby behaves during continuous slave recovery: > You can download PDF versions of the fine manual is here > http://wiki.postgresql.org/images/0/01/Hot_Standby_main.pdf The function call docs are at: > http://wiki.postgresql.org/images/1/10/Hot_Standby_Recovery_Functions.pdf -- Bruce Momjian <bruce(a)momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 23 Sep 2009 04:13 Looking at the way cache invalidations are handled in two-phase transactions, it would be simpler if we store the shared cache invalidation messages in the twophase state file header, like we store deleted relations and subxids. This allows them to be copied to the COMMIT_PREPARED WAL record, so that we don't need treat twophase commits differently than regular ones in xact_redo_commit. As the patch stands, the new xactGetCommittedInvalidationMessages/MakeSharedInvalidMessagesArray mechanism is duplicated functionality with AtPrepare_Inval/-PersistInvalidationMessage - both materialize the pending shared invalidation messages so that they can be written to disk. I did that in my git branch. I wonder if we might have alignment issues with the SharedInvalidationMessages being stored in WAL records, following the subxids. All the data stored in that record have 4-byte alignment at the moment, but if SharedInvalidationMessage ever needs 8-byte alignment, we would have trouble. Probably not worth changing code, it's highly unlikely that SharedInvalidationMessage will ever need 8-byte alignment, but perhaps a comment somewhere would be in order. I note that we don't emit RunningXacts after a shutdown checkpoint. So if recovery starts at a shutdown checkpoint, we don't let read-only backends in until the first online checkpoint. Could we treat a shutdown checkpoint as a snapshot with no transactions running? Or do prepared transactions screw that up? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 23 Sep 2009 05:07 The logic in the lock manager to track the number of held AccessExclusiveLocks (with ProcArrayIncrementNumHeldLocks and ProcArrayDecrementNumHeldLocks) seems to be broken. I added an Assertion into ProcArrayDecrementNumHeldLocks: --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -1401,6 +1401,7 @@ ProcArrayIncrementNumHeldLocks(PGPROC *proc) void ProcArrayDecrementNumHeldLocks(PGPROC *proc) { + Assert(proc->numHeldLocks > 0); proc->numHeldLocks--; } This tripped the assertion: postgres=# CREATE TABLE foo (id int4 primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo" server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Making matters worse, the primary server refuses to startup up after that, tripping the assertion again in crash recovery: $ bin/postmaster -D data LOG: database system was interrupted while in recovery at 2009-09-23 11:56:15 EEST HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/32000070 LOG: REDO @ 0/32000070; LSN 0/320000AC: prev 0/32000020; xid 0; len 32: Heap2 - clean: rel 1663/11562/1249; blk 32 remxid 4352 LOG: consistent recovery state reached LOG: REDO @ 0/320000AC; LSN 0/320000CC: prev 0/32000070; xid 0; len 4: XLOG - nextOid: 24600 LOG: REDO @ 0/320000CC; LSN 0/320000F4: prev 0/320000AC; xid 0; len 12: Storage - file create: base/11562/16408 LOG: REDO @ 0/320000F4; LSN 0/3200011C: prev 0/320000CC; xid 4364; len 12: Relation - exclusive relation lock: xid 4364 db 11562 rel 16408 LOG: REDO @ 0/3200011C; LSN 0/320001D8: prev 0/320000F4; xid 4364; len 159: Heap - insert: rel 1663/11562/1259; tid 5/4 .... LOG: REDO @ 0/32004754; LSN 0/32004878: prev 0/320046A8; xid 4364; len 264: Transaction - commit: 2009-09-23 11:55:51.888398+03; 15 inval msgs:catcache id38 catcache id37 catcache id38 catcache id37 catcache id38 catcache id37 catcache id7 catcache id6 catcache id26 smgr relcache smgr relcache smgr relcache TRAP: FailedAssertion("!(proc->numHeldLocks > 0)", File: "procarray.c", Line: 1404) LOG: startup process (PID 27430) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure I'm sure that's just a simple bug somewhere, but it highlights that we need be careful to avoid putting any extra work into the normal recovery path. Otherwise bugs in hot standby related code can cause crash recovery to fail. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 23 Sep 2009 05:33 On Wed, 2009-09-23 at 11:13 +0300, Heikki Linnakangas wrote: > Looking at the way cache invalidations are handled in two-phase > transactions, it would be simpler if we store the shared cache > invalidation messages in the twophase state file header, like we store > deleted relations and subxids. This allows them to be copied to the > COMMIT_PREPARED WAL record, so that we don't need treat twophase commits > differently than regular ones in xact_redo_commit. As the patch stands, > the new > xactGetCommittedInvalidationMessages/MakeSharedInvalidMessagesArray > mechanism is duplicated functionality with > AtPrepare_Inval/-PersistInvalidationMessage - both materialize the > pending shared invalidation messages so that they can be written to > disk. I did that in my git branch. We could, but the prepared transaction path already contains special case code anyway, so we aren't reducing number of test cases required. This looks like a possible area for refactoring, but I don't see the need for pre-factoring. i.e. don't rework before commit, rework after. > I wonder if we might have alignment issues with the > SharedInvalidationMessages being stored in WAL records, following the > subxids. All the data stored in that record have 4-byte alignment at the > moment, but if SharedInvalidationMessage ever needs 8-byte alignment, we > would have trouble. Probably not worth changing code, it's highly > unlikely that SharedInvalidationMessage will ever need 8-byte alignment, > but perhaps a comment somewhere would be in order. It's a possible source of bugs, but there are no issues there AFAICS. The technique of multiple arrays on a WAL record wasn't invented by this patch. > I note that we don't emit RunningXacts after a shutdown checkpoint. So > if recovery starts at a shutdown checkpoint, we don't let read-only > backends in until the first online checkpoint. Could we treat a shutdown > checkpoint as a snapshot with no transactions running? Or do prepared > transactions screw that up? We could, but I see no requirement for starting HS from a backup taken on a shutdown database. It's just another special case to test and since we already have significant number of important test cases I'd say add this later. That seems to have reflected all of your points on this post, though thanks for the comments. I'm keen to reduce complexity in areas that have caused bugs, but for stuff that is working I am tempted to leave alone on such a big patch. Anything we can do to avoid re-testing sections of code and/or reduce the number of tests required is going to increase stability. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 23 Sep 2009 05:34
On Wed, 2009-09-23 at 12:07 +0300, Heikki Linnakangas wrote: > it highlights that we > need be careful to avoid putting any extra work into the normal > recovery > path. Otherwise bugs in hot standby related code can cause crash > recovery to fail. Excellent point. I will put in additional protective code there. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |