From: Jeff Janes on 21 Sep 2009 22:42 On Tue, Sep 15, 2009 at 2:41 PM, Simon Riggs <simon(a)2ndquadrant.com> wrote: > > OK, here is the latest version of the Hot Standby patchset. This is > about version 30+ by now, but we should regard this as 0.2.1 > Patch against CVS HEAD (now): clean apply, compile, no known bugs. > > OVERVIEW > > You can download PDF versions of the fine manual is here > http://wiki.postgresql.org/images/0/01/Hot_Standby_main.pdf From this doc: "In recovery, transactions will not be permitted to take any lock higher other than AccessShareLock or AccessExclusiveLock. In addition, transactions may never assign a TransactionId and may never write WAL. The LOCK TABLE command by default applies an AccessExclusiveLock. Any LOCK TABLE command that runs on the standby and requests a specific lock type other than AccessShareLock will be rejected." The first sentence seems to say that clients on the stand-by can take ACCESS EXCLUSIVE, while the last sentence seems to say that they cannot do so. I did a little experiment on a hot standby instance. I expected that either I would be denied the lock altogether, or the lock would cause WAL replay to be paused until either I committed or was forcibly canceled. But neither happened, I was granted the lock but WAL replay continued anyway. jjanes=# begin; BEGIN jjanes=# lock table pgbench_history in access exclusive mode; LOCK TABLE jjanes=# select count(*) from pgbench_history; count -------- 519104 (1 row) jjanes=# select count(*) from pgbench_history; count -------- 527814 (1 row) Is this the expected behavior? Thanks, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 22 Sep 2009 03:11 Simon Riggs wrote: > On Mon, 2009-09-21 at 14:01 +0100, Simon Riggs wrote: >> On Mon, 2009-09-21 at 13:50 +0300, Heikki Linnakangas wrote: > >>> is this that we seem to be missing conflict >>> resolution for GiST index tuples deleted by the kill_prior_tuples >>> mechanism. Unless I'm missing something, we need similar handling there >>> that we have in b-tree. >> OK, I agree with that. Straightforward change. Thanks very much. >> >> I marked the comment to indicate that the handling for GIST and GIN >> indexes looked dubious to me also. I had the earlier "it is safe" >> comments but that was before we looked at the kill prior tuples issue. > > ISTM I looked at this too quickly. > > kill_prior_tuple is only ever set by these lines, after scan starts: > > if (!scan->xactStartedInRecovery) > scan->kill_prior_tuple = scan->xs_hot_dead; > > which is set in indexam.c, not within any particular am. So the coding, > as submitted, covers all index types, current and future. That just stops more tuples from being killed in the standby. I was thinking that we need similar conflict resolution in GiST that we do with b-tree delete records, to stop killed tuples from being deleted while they might still be needed in the standby. But looking closer at GiST, it seems that GiST doesn't actually do that; killed tuples are not removed at page splits, but only by VACUUM. So that's not an issue after all. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 22 Sep 2009 03:22 On Mon, 2009-09-21 at 19:42 -0700, Jeff Janes wrote: > On Tue, Sep 15, 2009 at 2:41 PM, Simon Riggs <simon(a)2ndquadrant.com> wrote: > > > > OK, here is the latest version of the Hot Standby patchset. This is > > about version 30+ by now, but we should regard this as 0.2.1 > > Patch against CVS HEAD (now): clean apply, compile, no known bugs. > > > > OVERVIEW > > > > You can download PDF versions of the fine manual is here > > http://wiki.postgresql.org/images/0/01/Hot_Standby_main.pdf > > > >From this doc: > > "In recovery, transactions will not be permitted to take any lock > higher other than > AccessShareLock or AccessExclusiveLock. In addition, transactions may never > assign a TransactionId and may never write WAL. The LOCK TABLE command by > default applies an AccessExclusiveLock. Any LOCK TABLE command that runs on > the standby and requests a specific lock type other than AccessShareLock will be > rejected." > > The first sentence seems to say that clients on the stand-by can take > ACCESS EXCLUSIVE, while the last sentence seems to say that they > cannot do so. You are right to pick up that discrepancy, as Heikki did also. The root cause of that discrepancy is a change in patch behaviour between January and now that I will use your post to highlight and discuss, if you don't mind. (and yes, the docs need to be corrected) Initially, it seemed that it was certain that a read-only backend could not take an AccessExclusiveLock. On further thought, there is no particular reason to block AccessExclusiveLocks themselves, just that most things you might do while holding one are banned. But the lock itself is fine. (Any challenge on that?) AccessExclusiveLocks can be used to serialize the actions of other backends. That is a very common use case, so my concern was that LOCK TABLE would be a no-op unless we allowed AccessExclusiveLock, so the patch does allow it. > I did a little experiment on a hot standby instance. I expected that > either I would be denied the lock altogether, or the lock would cause > WAL replay to be paused until either I committed or was forcibly > canceled. But neither happened, I was granted the lock but WAL replay > continued anyway. > > jjanes=# begin; > BEGIN > jjanes=# lock table pgbench_history in access exclusive mode; > LOCK TABLE > jjanes=# select count(*) from pgbench_history; > count > -------- > 519104 > (1 row) > > jjanes=# select count(*) from pgbench_history; > count > -------- > 527814 > (1 row) > > Is this the expected behavior? By me, yes. WAL replay does not require a table lock to progress. Any changes are protected with block-level locks. It does acquire a table lock and cancel conflicting queries when it is about to replay something that would cause a query to explode, such as dropping a table, as explained in docs. So this is not a bug. The explanation of how the above sequence of events occurs is that the backend acquires AccessExclusiveLock - please check on other session in pg_locks. WAL replay continues by the Startup process, inserting further rows into the pgbench_history table as a series of transactions. The second select takes a later snapshot than the first and so sees more data than the first select, hence a larger count. (And I am pleased to see that recovery is progressing quickly even while your queries run). So not a bug, but just one of many possible behaviours we could enforce. 1. Allow AccessExclusiveLocks yet they do not interrupt WAL replay 2. Allow AccessExclusiveLocks but have them pause WAL replay 3. Disallow AccessExclusiveLocks (and so LOCK TABLE is effectively a no-op because it will not be able to serialize anything) So the patch originally implemented (3) but now implements (1). I would say that (2) is very undesirable because it puts WAL replay in the control of non-superusers. That could mean LOCK TABLE implicitly alters the high availability of the standby, and might even be used maliciously to do that. I'm open to views on whether we should use (1) or (3). Comments? Implementing either is no problem and we have a straight choice. We may even wish to review that again later from additional feedback. (Jeff, you have also helped me understand that there is a bug in the way serializable transactions are cancelled, which is easily corrected. Thanks for that unexpected windfall, but it is otherwise unrelated to your comments.) -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 22 Sep 2009 04:04 Simon Riggs wrote: > On Mon, 2009-09-21 at 19:42 -0700, Jeff Janes wrote: >> jjanes=# begin; >> BEGIN >> jjanes=# lock table pgbench_history in access exclusive mode; >> LOCK TABLE >> jjanes=# select count(*) from pgbench_history; >> count >> -------- >> 519104 >> (1 row) >> >> jjanes=# select count(*) from pgbench_history; >> count >> -------- >> 527814 >> (1 row) >> >> Is this the expected behavior? > > By me, yes. WAL replay does not require a table lock to progress. Any > changes are protected with block-level locks. It does acquire a table > lock and cancel conflicting queries when it is about to replay something > that would cause a query to explode, such as dropping a table, as > explained in docs. That is rather surprising. You can't get that result in a normal server, which I think is enough of a reason to disallow it. If you do LOCK TABLE ACCESS EXCLUSIVE, you wouldn't expect the contents to change under your nose. > So not a bug, but just one of many possible behaviours we could enforce. > 1. Allow AccessExclusiveLocks yet they do not interrupt WAL replay > 2. Allow AccessExclusiveLocks but have them pause WAL replay > 3. Disallow AccessExclusiveLocks (and so LOCK TABLE is effectively a > no-op because it will not be able to serialize anything) > > So the patch originally implemented (3) but now implements (1). > > I would say that (2) is very undesirable because it puts WAL replay in > the control of non-superusers. That could mean LOCK TABLE implicitly > alters the high availability of the standby, and might even be used > maliciously to do that. I don't see a problem with (2) as long as the locker is kicked out after max_standby_delay, like a long-running query. That's what I would expect. I'm fine with (3) as well, but (1) does seem rather suprising behavior. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 22 Sep 2009 04:28
On Tue, 2009-09-22 at 11:04 +0300, Heikki Linnakangas wrote: > > > > By me, yes. WAL replay does not require a table lock to progress. Any > > changes are protected with block-level locks. It does acquire a table > > lock and cancel conflicting queries when it is about to replay something > > that would cause a query to explode, such as dropping a table, as > > explained in docs. > > That is rather surprising. You can't get that result in a normal server, > which I think is enough of a reason to disallow it. If you do LOCK TABLE > ACCESS EXCLUSIVE, you wouldn't expect the contents to change under your > nose. OK, "normality" is a reasonable argument against. So (1) is only a partial implementation of serializing the table. > > So not a bug, but just one of many possible behaviours we could enforce. > > 1. Allow AccessExclusiveLocks yet they do not interrupt WAL replay > > 2. Allow AccessExclusiveLocks but have them pause WAL replay > > 3. Disallow AccessExclusiveLocks (and so LOCK TABLE is effectively a > > no-op because it will not be able to serialize anything) > > > > So the patch originally implemented (3) but now implements (1). > > > > I would say that (2) is very undesirable because it puts WAL replay in > > the control of non-superusers. That could mean LOCK TABLE implicitly > > alters the high availability of the standby, and might even be used > > maliciously to do that. > > I don't see a problem with (2) as long as the locker is kicked out after > max_standby_delay, like a long-running query. That's what I would > expect. I'm fine with (3) as well, but (1) does seem rather suprising > behavior. (2) gives other problems because it would force us to check for conflicting locks for each heap & index WAL record to ensure that the lock was honoured. We could optimize that but it's still going to cost. I'd rather leave things at (3) for now and wait for further feedback. "Start strict, relax later". -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |