From: Tom Lane on
Andres Freund <andres(a)anarazel.de> writes:
> What could the join removal path (and similar places) *possibly* do against
> such a case? Without stopping to use SnapshotNow I dont see any way :-(

But the planner, along with most of the rest of the backend, *does* use
SnapshotNow when examining the system catalogs.

I share your feeling of discomfort but so far I don't see a hole in
Simon's argument. Adding a constraint should never make a
previously-correct plan incorrect. Removing one is a very different
story, but he says he's not changing that case. (Disclaimer: I have
not read the patch.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Jul 16, 2010, at 6:01 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Andres Freund <andres(a)anarazel.de> writes:
>> What could the join removal path (and similar places) *possibly* do against
>> such a case? Without stopping to use SnapshotNow I dont see any way :-(
>
> But the planner, along with most of the rest of the backend, *does* use
> SnapshotNow when examining the system catalogs.
>
> I share your feeling of discomfort but so far I don't see a hole in
> Simon's argument. Adding a constraint should never make a
> previously-correct plan incorrect. Removing one is a very different
> story, but he says he's not changing that case. (Disclaimer: I have
> not read the patch.)

Perhaps we should start by deciding whether Andres' case is a bug in the first place, and then we can argue about whether it's a join-removal bug, a lock-weakening bug, or a preexisting bug.

....Robert
--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andres Freund on
On Saturday 17 July 2010 01:53:24 Robert Haas wrote:
> On Jul 16, 2010, at 6:01 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> > Andres Freund <andres(a)anarazel.de> writes:
> >> What could the join removal path (and similar places) *possibly* do
> >> against such a case? Without stopping to use SnapshotNow I dont see any
> >> way :-(
> >
> > But the planner, along with most of the rest of the backend, *does* use
> > SnapshotNow when examining the system catalogs.
> >
> > I share your feeling of discomfort but so far I don't see a hole in
> > Simon's argument.
Neither do I.

> > Adding a constraint should never make a
> > previously-correct plan incorrect. Removing one is a very different
> > story, but he says he's not changing that case. (Disclaimer: I have
> > not read the patch.)
>
> Perhaps we should start by deciding whether Andres' case is a bug in the
> first place, and then we can argue about whether it's a join-removal bug,
> a lock-weakening bug, or a preexisting bug.
The case where its possible to produce such a case *after* having used/locked
all participating relations is new I think.
Being able to create invalid results by doing DDL in another connection on
not-yet-used tables is at least as old as constraint exclusion. Its a bit
easier to work around, but thats it.

So I personally would not consider this patch as having a bug anymore
(thinking helps...).

Whether the general issue is a bug or a to-be-more-exhausitive-documented-
gotcha I have no idea. I know two people having hit it in production - I dont
think its a that common issue though. Starting with the fact that not that
many people use serializable.

Just to help me: The primary reasons for using SnapshotNow is speed and in
some cases correctness (referential integrity). Right? Any other reasons?

Andres


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Andres Freund <andres(a)anarazel.de> writes:
> Just to help me: The primary reasons for using SnapshotNow is speed and in
> some cases correctness (referential integrity). Right? Any other reasons?

Well, the main point for system catalog accesses is that you *must* have
an up-to-date view of the table schemas. As an example, if someone just
added an index to an existing table, it would not do for an INSERT to
fail to update that index --- no matter whether it's from a serializable
transaction or not. So the DDL-executing transaction must hold a lock
that would block any operation that had better be able to see what it
did, and once another transaction has acquired the lock that lets it go
ahead with another operation, it had better see the results of the DDL
transaction.

However that argument mostly applies to what the executor does. A plan
could still be usable despite having been made against a now-obsolete
version of the table schema.

In the case at hand, I think most constraint-adding situations would
require at least ShareLock, because they had better block execution of
INSERT/UPDATE/DELETE operations that could fail to honor the constraint
if they didn't see it in the catalogs. But AFAICS, addition of a
constraint need not block SELECT, and it need not invalidate existing
plans.

CREATE INDEX uses ShareLock because it's okay to run multiple CREATE
INDEXes in parallel (thanks to some rather dodgy coding of the catalog
updates). For other cases of constraint additions, it might not be
practical to run two constraint additions in parallel. In that case we
could use ShareRowExclusive instead, which is self-exclusive but is not
any stronger than Share from the perspective of DML commands. Since
it's not, I'm unconvinced that it's worth taking any great pains to try
to make constraint additions run in parallel.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andres Freund on
On Saturday 17 July 2010 09:55:37 Simon Riggs wrote:
> On Fri, 2010-07-16 at 23:03 +0200, Andres Freund wrote:
> > Sure its not that bad, but at least it needs to get documented imho.
> > Likely others should chime in here ;-)
>
> Don't understand you. This is a clear bug in join removal, test case
> attached, a minor rework of your original test case.
As shown below the same issue exists in other codepaths that we cant easily fix
in a stable release :-( - so I think documenting it is the only viable action
for the back-branches.

> > What could the join removal path (and similar places) *possibly* do
> > against such a case? Without stopping to use SnapshotNow I dont see
> > any way :-(
> The bug is caused by allowing join removal to work in serializable
> transactions. The fix for 9.0 is easy and clear: disallow join removal
> when planning a query as the second or subsequent query in a
> serializable transaction.
>
> A wider fix might be worth doing for 9.1, not sure.

Unfortunately the same issue exists with constraint exclusion - and we can
hardly disable that for serializable transactions...


CREATE TABLE testconstr(data int);
INSERT INTO testconstr VALUES(1),(10);

T1:
test=# explain analyze SELECT * FROM testconstr WHERE data > 5;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on testconstr (cost=0.00..40.00 rows=800 width=4) (actual
time=0.029..0.032 rows=1 loops=1)
Filter: (data > 5)
Total runtime: 0.097 ms
(3 rows)

test=# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN

--make sure we do have a snapshot
test=# SELECT * FROM pg_class WHERE 0 = 1

T2:
DELETE FROM testconstr WHERE data >= 5;
ALTER TABLE testconstr ADD CONSTRAINT t CHECK(data < 5);

T1:
test=# explain analyze SELECT * FROM testconstr WHERE data > 5;
QUERY PLAN
------------------------------------------------------------------------------------
Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.003 rows=0
loops=1)
One-Time Filter: false
Total runtime: 0.045 ms
(3 rows)

test=# SET constraint_exclusion = false;
SET
test=# explain analyze SELECT * FROM testconstr WHERE data > 5;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on testconstr (cost=0.00..40.00 rows=800 width=4) (actual
time=0.030..0.033 rows=1 loops=1)
Filter: (data > 5)
Total runtime: 0.099 ms
(3 rows)


Thats seems to be an issue that you realistically can hit in production...

I think the same problem exists with inheritance planning - i.e. a child table
added to a relation in T1 while T2 already holds a snapshot but hasnt used
that specific table was created will see the new child. Thats less severe but
still annoying.

Beside using an actual Snapshot in portions of the planner (i.e. stats should
continue using SnapshotNow) I dont really see a fix here.


Andres

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers