Hot Standby 0.2.1 [PgSql]

Prev: PGCluster-II Progress
Next: Unicode UTF-8 table formatting for psql text output

From: Jeff Janes on 21 Sep 2009 22:42

On Tue, Sep 15, 2009 at 2:41 PM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
>
> OK, here is the latest version of the Hot Standby patchset. This is
> about version 30+ by now, but we should regard this as 0.2.1
> Patch against CVS HEAD (now): clean apply, compile, no known bugs.
>
> OVERVIEW
>
> You can download PDF versions of the fine manual is here
> http://wiki.postgresql.org/images/0/01/Hot_Standby_main.pdf

From this doc:

"In recovery, transactions will not be permitted to take any lock
higher other than
AccessShareLock or AccessExclusiveLock. In addition, transactions may never
assign a TransactionId and may never write WAL. The LOCK TABLE command by
default applies an AccessExclusiveLock. Any LOCK TABLE command that runs on
the standby and requests a specific lock type other than AccessShareLock will be
rejected."

The first sentence seems to say that clients on the stand-by can take
ACCESS EXCLUSIVE, while the last sentence seems to say that they
cannot do so.

I did a little experiment on a hot standby instance. I expected that
either I would be denied the lock altogether, or the lock would cause
WAL replay to be paused until either I committed or was forcibly
canceled. But neither happened, I was granted the lock but WAL replay
continued anyway.

jjanes=# begin;
BEGIN
jjanes=# lock table pgbench_history in access exclusive mode;
LOCK TABLE
jjanes=# select count(*) from pgbench_history;
count
--------
519104
(1 row)

jjanes=# select count(*) from pgbench_history;
count
--------
527814
(1 row)

Is this the expected behavior?

Thanks,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 22 Sep 2009 03:11

Simon Riggs wrote:
> On Mon, 2009-09-21 at 14:01 +0100, Simon Riggs wrote:
>> On Mon, 2009-09-21 at 13:50 +0300, Heikki Linnakangas wrote:
>
>>> is this that we seem to be missing conflict
>>> resolution for GiST index tuples deleted by the kill_prior_tuples
>>> mechanism. Unless I'm missing something, we need similar handling there
>>> that we have in b-tree.
>> OK, I agree with that. Straightforward change. Thanks very much.
>>
>> I marked the comment to indicate that the handling for GIST and GIN
>> indexes looked dubious to me also. I had the earlier "it is safe"
>> comments but that was before we looked at the kill prior tuples issue.
>
> ISTM I looked at this too quickly.
>
> kill_prior_tuple is only ever set by these lines, after scan starts:
>
> if (!scan->xactStartedInRecovery)
> scan->kill_prior_tuple = scan->xs_hot_dead;
>
> which is set in indexam.c, not within any particular am. So the coding,
> as submitted, covers all index types, current and future.

That just stops more tuples from being killed in the standby. I was
thinking that we need similar conflict resolution in GiST that we do
with b-tree delete records, to stop killed tuples from being deleted
while they might still be needed in the standby.

But looking closer at GiST, it seems that GiST doesn't actually do that;
killed tuples are not removed at page splits, but only by VACUUM. So
that's not an issue after all.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 22 Sep 2009 03:22

On Mon, 2009-09-21 at 19:42 -0700, Jeff Janes wrote:
> On Tue, Sep 15, 2009 at 2:41 PM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
> >
> > OK, here is the latest version of the Hot Standby patchset. This is
> > about version 30+ by now, but we should regard this as 0.2.1
> > Patch against CVS HEAD (now): clean apply, compile, no known bugs.
> >
> > OVERVIEW
> >
> > You can download PDF versions of the fine manual is here
> > http://wiki.postgresql.org/images/0/01/Hot_Standby_main.pdf
>
>
> >From this doc:
>
> "In recovery, transactions will not be permitted to take any lock
> higher other than
> AccessShareLock or AccessExclusiveLock. In addition, transactions may never
> assign a TransactionId and may never write WAL. The LOCK TABLE command by
> default applies an AccessExclusiveLock. Any LOCK TABLE command that runs on
> the standby and requests a specific lock type other than AccessShareLock will be
> rejected."
>
> The first sentence seems to say that clients on the stand-by can take
> ACCESS EXCLUSIVE, while the last sentence seems to say that they
> cannot do so.

You are right to pick up that discrepancy, as Heikki did also.

The root cause of that discrepancy is a change in patch behaviour
between January and now that I will use your post to highlight and
discuss, if you don't mind. (and yes, the docs need to be corrected)

Initially, it seemed that it was certain that a read-only backend could
not take an AccessExclusiveLock. On further thought, there is no
particular reason to block AccessExclusiveLocks themselves, just that
most things you might do while holding one are banned. But the lock
itself is fine. (Any challenge on that?)

AccessExclusiveLocks can be used to serialize the actions of other
backends. That is a very common use case, so my concern was that LOCK
TABLE would be a no-op unless we allowed AccessExclusiveLock, so the
patch does allow it.

> I did a little experiment on a hot standby instance. I expected that
> either I would be denied the lock altogether, or the lock would cause
> WAL replay to be paused until either I committed or was forcibly
> canceled. But neither happened, I was granted the lock but WAL replay
> continued anyway.
>
> jjanes=# begin;
> BEGIN
> jjanes=# lock table pgbench_history in access exclusive mode;
> LOCK TABLE
> jjanes=# select count(*) from pgbench_history;
> count
> --------
> 519104
> (1 row)
>
> jjanes=# select count(*) from pgbench_history;
> count
> --------
> 527814
> (1 row)
>
> Is this the expected behavior?

By me, yes. WAL replay does not require a table lock to progress. Any
changes are protected with block-level locks. It does acquire a table
lock and cancel conflicting queries when it is about to replay something
that would cause a query to explode, such as dropping a table, as
explained in docs.

So this is not a bug.

The explanation of how the above sequence of events occurs is that the
backend acquires AccessExclusiveLock - please check on other session in
pg_locks. WAL replay continues by the Startup process, inserting further
rows into the pgbench_history table as a series of transactions. The
second select takes a later snapshot than the first and so sees more
data than the first select, hence a larger count. (And I am pleased to
see that recovery is progressing quickly even while your queries run).

So not a bug, but just one of many possible behaviours we could enforce.
1. Allow AccessExclusiveLocks yet they do not interrupt WAL replay
2. Allow AccessExclusiveLocks but have them pause WAL replay
3. Disallow AccessExclusiveLocks (and so LOCK TABLE is effectively a
no-op because it will not be able to serialize anything)

So the patch originally implemented (3) but now implements (1).

I would say that (2) is very undesirable because it puts WAL replay in
the control of non-superusers. That could mean LOCK TABLE implicitly
alters the high availability of the standby, and might even be used
maliciously to do that.

I'm open to views on whether we should use (1) or (3). Comments?

Implementing either is no problem and we have a straight choice. We may
even wish to review that again later from additional feedback.

(Jeff, you have also helped me understand that there is a bug in the way
serializable transactions are cancelled, which is easily corrected.
Thanks for that unexpected windfall, but it is otherwise unrelated to
your comments.)

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 22 Sep 2009 04:04

Simon Riggs wrote:
> On Mon, 2009-09-21 at 19:42 -0700, Jeff Janes wrote:
>> jjanes=# begin;
>> BEGIN
>> jjanes=# lock table pgbench_history in access exclusive mode;
>> LOCK TABLE
>> jjanes=# select count(*) from pgbench_history;
>> count
>> --------
>> 519104
>> (1 row)
>>
>> jjanes=# select count(*) from pgbench_history;
>> count
>> --------
>> 527814
>> (1 row)
>>
>> Is this the expected behavior?
>
> By me, yes. WAL replay does not require a table lock to progress. Any
> changes are protected with block-level locks. It does acquire a table
> lock and cancel conflicting queries when it is about to replay something
> that would cause a query to explode, such as dropping a table, as
> explained in docs.

That is rather surprising. You can't get that result in a normal server,
which I think is enough of a reason to disallow it. If you do LOCK TABLE
ACCESS EXCLUSIVE, you wouldn't expect the contents to change under your
nose.

> So not a bug, but just one of many possible behaviours we could enforce.
> 1. Allow AccessExclusiveLocks yet they do not interrupt WAL replay
> 2. Allow AccessExclusiveLocks but have them pause WAL replay
> 3. Disallow AccessExclusiveLocks (and so LOCK TABLE is effectively a
> no-op because it will not be able to serialize anything)
>
> So the patch originally implemented (3) but now implements (1).
>
> I would say that (2) is very undesirable because it puts WAL replay in
> the control of non-superusers. That could mean LOCK TABLE implicitly
> alters the high availability of the standby, and might even be used
> maliciously to do that.

I don't see a problem with (2) as long as the locker is kicked out after
max_standby_delay, like a long-running query. That's what I would
expect. I'm fine with (3) as well, but (1) does seem rather suprising
behavior.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 22 Sep 2009 04:28

On Tue, 2009-09-22 at 11:04 +0300, Heikki Linnakangas wrote:
> >
> > By me, yes. WAL replay does not require a table lock to progress. Any
> > changes are protected with block-level locks. It does acquire a table
> > lock and cancel conflicting queries when it is about to replay something
> > that would cause a query to explode, such as dropping a table, as
> > explained in docs.
>
> That is rather surprising. You can't get that result in a normal server,
> which I think is enough of a reason to disallow it. If you do LOCK TABLE
> ACCESS EXCLUSIVE, you wouldn't expect the contents to change under your
> nose.

OK, "normality" is a reasonable argument against. So (1) is only a
partial implementation of serializing the table.

> > So not a bug, but just one of many possible behaviours we could enforce.
> > 1. Allow AccessExclusiveLocks yet they do not interrupt WAL replay
> > 2. Allow AccessExclusiveLocks but have them pause WAL replay
> > 3. Disallow AccessExclusiveLocks (and so LOCK TABLE is effectively a
> > no-op because it will not be able to serialize anything)
> >
> > So the patch originally implemented (3) but now implements (1).
> >
> > I would say that (2) is very undesirable because it puts WAL replay in
> > the control of non-superusers. That could mean LOCK TABLE implicitly
> > alters the high availability of the standby, and might even be used
> > maliciously to do that.
>
> I don't see a problem with (2) as long as the locker is kicked out after
> max_standby_delay, like a long-running query. That's what I would
> expect. I'm fine with (3) as well, but (1) does seem rather suprising
> behavior.

(2) gives other problems because it would force us to check for
conflicting locks for each heap & index WAL record to ensure that the
lock was honoured. We could optimize that but it's still going to cost.

I'd rather leave things at (3) for now and wait for further feedback.
"Start strict, relax later".

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: PGCluster-II Progress
Next: Unicode UTF-8 table formatting for psql text output