Writeable CTE patch [PgSql]

Prev: named parameters in SQL functions
Next: [HACKERS] ORDER BY vs. volatile functions

From: Tom Lane on 27 Nov 2009 14:44

I wrote:
> I think this is worth doing since it cleans up one of the grottier
> parts of executor initialization. The whole thing around
> ExecContextForcesOids was never pretty, and it's been the source of
> more than one bug if memory serves.

On further review there's a really serious stumbling block here.
Consider
INSERT INTO t1 SELECT * FROM t2 UNION ALL SELECT * FROM t3
where the three tables all have the same user columns but t2 has
OIDs and t3 not (or vice versa). Without ExecContextForcesOids
or something very much like it, both scan nodes will think they
can return physical tuples. The output of the Append node will
therefore contain some tuples with OIDs and some without. Append
itself can't fix that since it doesn't project. In many queries
this would not matter --- but if we are inserting them directly
into t1 without any further filtering, it does matter.

I can imagine various ways around this, but it's not clear that
any of them are much less grotty than the code is now. In any
case this was just a marginal code cleanup idea and it doesn't
seem worth spending so much time on right now.

I'm going to go back to plan A: drop the es_result_relation_info
changes from the patch.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 28 Nov 2009 13:59

Marko Tiikkaja <marko.tiikkaja(a)cs.helsinki.fi> writes:
> Attached is the latest version of the patch.

I looked through this patch and concluded that it still needs a fair
amount of work, so I'm bouncing it back for further work.

1. I thought we'd agreed at
http://archives.postgresql.org/pgsql-hackers/2009-10/msg00558.php
that the patch should support WITH on DML statements, eg
with (some-query) insert into foo ...
This might not take much more than grammar additions, but it's
definitely lacking at the moment.

2. The handling of rules on DML WITH queries is far short of sufficient.
AFAICT, what it's doing is rewriting the query, then taking the first
or last element of the resulting query list as replacing the WITH
query, and adding the rest of the list after or before the main query.
This does not work at all for cases involving conditional DO INSTEAD
rules, since there could be more than one element of the resulting
query list that's responsible for delivering results depending on the
runtime outcome of the condition. I don't think it works for
unconditional DO INSTEAD either, since the rule producing output might
not be the first or last one. And in any case it fails to satisfy the
POLA in regards to the order of execution of DO ALSO queries relative
to other WITH queries or the main query.

I am not sure that it is possible to fix this without really drastic
surgery on the rule mechanisms. Or maybe we ought to rethink what
the representation of DML WITH queries is.

Perhaps it would be acceptable to just throw ERROR_FEATURE_NOT_SUPPORTED
when there are DO ALSO or conditional DO INSTEAD rules applying to the
target of a DML WITH query. I wouldn't normally think that just blowing
off such a thing meets the project's quality standards, but we all know
that the current rule mechanism is in need of a ground-up redesign anyway.
It's hard to justify putting a lot of work into making it work with DML
WITH queries when we might be throwing it all out in the future.

One thing that really does have to draw an error is that AFAIR the current
rule feature doesn't enforce that a rewritten query produce the same type
of output that the original would have. We just ship off whatever the
results are to the client, and let it sort everything out. In a DML WITH
query, though, I think we do have to insist that the rewritten query(s)
still produce the same RETURNING rowtype as before.

3. I'm pretty unimpressed with the code added to ExecutePlan. It knows
way more than it ought to about CTEs, and yet I don't think it's doing the
right things anyway --- in particular, won't it run the "leader" CTE more
than once if one CTE references another? I think it would be better if
the PlannedStmt representation just told ExecutePlan what to do, rather
than having all these heuristics inside ExecutePlan. (BTW, I also think
it would work better if you had the CommandCounterIncrement at the bottom
of the loop, after the subquery execution not before it. But I'm not sure
it's safe for ExecutePlan to be modifying the snapshot it's handed anyway.)

I wonder whether it would be practical to fix both #2 and #3 by having the
representation of DML WITH queries look more like the representation of
rule rewrite output --- that is, generate a list of top-level Queries
not one Query with DML subqueries in its CTE list. The main thing that
seems to be missing in order to allow that is for a Query to refer back to
the output of a previous Query in the list. This doesn't seem
tremendously hard at runtime --- it's just a tuplestore to keep around
--- but I'm not clear what it ought to look like in terms of the parsetree
representation.

4. As previously noted, the changes to avoid using es_result_relation_info
are broken and need to be dropped from the patch.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Alex Hunsaker on 29 Nov 2009 17:38

On Sat, Nov 28, 2009 at 11:59, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> 1. I thought we'd agreed at
> http://archives.postgresql.org/pgsql-hackers/2009-10/msg00558.php
> that the patch should support WITH on DML statements, eg
> Â Â Â Â with (some-query) insert into foo ...
> This might not take much more than grammar additions, but it's
> definitely lacking at the moment.

Hrm ? A few messages down you say SELECT should be a good start

http://archives.postgresql.org/pgsql-hackers/2009-10/msg01081.php

> 2. The handling of rules on DML WITH queries is far short of sufficient.

Ick.

> Perhaps it would be acceptable to just throw ERROR_FEATURE_NOT_SUPPORTED
> when there are DO ALSO or conditional DO INSTEAD rules applying to the
> target of a DML WITH query.

+1

> 3. I'm pretty unimpressed with the code added to ExecutePlan.
> I wonder whether it would be practical to fix both #2 and #3 by having the
> representation of DML WITH queries look more like the representation of
> rule rewrite output

Interesting... This seems like the best solution ( assuming its
workable ). It also looks like it might make #1 easier as well.

However, I think the current approach does have some virtue in that I
was surprised how little the patch was. Granted that is partly due to
ExecutePlan knowing to much.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Marko Tiikkaja on 30 Nov 2009 13:43

Tom Lane wrote:
> 1. I thought we'd agreed at
> http://archives.postgresql.org/pgsql-hackers/2009-10/msg00558.php
> that the patch should support WITH on DML statements, eg
> with (some-query) insert into foo ...
> This might not take much more than grammar additions, but it's
> definitely lacking at the moment.

Ok, I added these.

> One thing that really does have to draw an error is that AFAIR the current
> rule feature doesn't enforce that a rewritten query produce the same type
> of output that the original would have. We just ship off whatever the
> results are to the client, and let it sort everything out. In a DML WITH
> query, though, I think we do have to insist that the rewritten query(s)
> still produce the same RETURNING rowtype as before.

Agreed.

> 3. I'm pretty unimpressed with the code added to ExecutePlan. It knows
> way more than it ought to about CTEs, and yet I don't think it's doing the
> right things anyway --- in particular, won't it run the "leader" CTE more
> than once if one CTE references another?

Yes. Are you suggesting something more intelligent to avoid scanning
the CTE more than once or..?

> I think it would be better if
> the PlannedStmt representation just told ExecutePlan what to do, rather
> than having all these heuristics inside ExecutePlan.

Yup, seems like a better choice.

> (BTW, I also think
> it would work better if you had the CommandCounterIncrement at the bottom
> of the loop, after the subquery execution not before it. But I'm not sure
> it's safe for ExecutePlan to be modifying the snapshot it's handed anyway.)

Agreed. I'm a bit lost here with the snapshot business; is doing this
work in ExecutePlan() out of the question or is it just that what I'm
doing is wrong?

> 4. As previously noted, the changes to avoid using es_result_relation_info
> are broken and need to be dropped from the patch.

Done. I kept the logic for result relations to allow nested ModifyTable
nodes, but I don't think it ever did the right thing with EvalPlanQual()
and nested nodes. I'll have think about that.

Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 30 Nov 2009 13:56

Marko Tiikkaja <marko.tiikkaja(a)cs.helsinki.fi> writes:
> Tom Lane wrote:
>> (BTW, I also think
>> it would work better if you had the CommandCounterIncrement at the bottom
>> of the loop, after the subquery execution not before it. But I'm not sure
>> it's safe for ExecutePlan to be modifying the snapshot it's handed anyway.)

> Agreed. I'm a bit lost here with the snapshot business; is doing this
> work in ExecutePlan() out of the question or is it just that what I'm
> doing is wrong?

I think it's not a good idea for ExecutePlan to be scribbling on the
executor's input, and the provided snapshot is definitely an input.
It might accidentally fail to fail in the present system, but it would
always be a hazard. The only thing that I'd be comfortable with is
copying the snap and modifying the copy. This might be okay from a
performance standpoint if it's done at the bottom of the loop (ie,
only when you actually have at least one writable CTE). It would be
altogether cleaner though if the CommandCounterIncrement responsibility
were in the same place it is now, ie the caller of the executor. Which
could be possible if we restructure the rewriter/planner output as a
list of Queries instead of just one. I'm not currently sure how hard
that would be, though; it might not be a practical answer.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3
Prev: named parameters in SQL functions
Next: [HACKERS] ORDER BY vs. volatile functions