Functional dependencies and GROUP BY [PgSql]

Prev: [HACKERS] Functional dependencies and GROUP BY
Next: [HACKERS] How to get permission to debug postgres?

From: Tom Lane on 8 Jun 2010 11:33

Stephen Frost <sfrost(a)snowman.net> writes:
> * Tom Lane (tgl(a)sss.pgh.pa.us) wrote:
>> Perhaps the correct fix would be to mark stored query trees as having a
>> dependency on the index, so that dropping the index/constraint would
>> force a drop of the rule too.

> Alternatively, we could rewrite the rule (not unlike what we do for
> "SELECT *") to actually add on the other implicitly grouped-by columns..
> I don't know if that's better or worse than creating a dependency,
> since if the constraint were dropped/changed, people might expect the
> rule's output to change.

Hm. The problem with that is that one of the benefits we'd like to get
from this is an efficiency win: the generated plan ought to only group
by the PK, not uselessly sort/group by everything in the row. I suppose
we could have the planner reverse-engineer its way to that, but it seems
awfully slow and clunky to add on the extra columns and then reason our
way to removing them again.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Stephen Frost on 8 Jun 2010 11:51

* Tom Lane (tgl(a)sss.pgh.pa.us) wrote:
> Hm. The problem with that is that one of the benefits we'd like to get
> from this is an efficiency win: the generated plan ought to only group
> by the PK, not uselessly sort/group by everything in the row. I suppose
> we could have the planner reverse-engineer its way to that, but it seems
> awfully slow and clunky to add on the extra columns and then reason our
> way to removing them again.

That's certainly a good point. Another issue that I realized when
thinking about this again- if someone wanted to *drop* a column that's
part of a PK (since it turned out to not be necessary, for example), and
then wanted to recreate the rule based on what was stored in the
catalog, they wouldn't be able to without modifying it, and that's
certainly be annoying too.

Guess my 2c would be for creating the dependency. I really dislike the
idea of the rule just all of a sudden breaking.

Thanks,

Stephen

From: Peter Eisentraut on 11 Jun 2010 08:02

On tis, 2010-06-08 at 10:21 -0400, Tom Lane wrote:
> The question is why bother to recognize *any* cases of this form.
> I find it really semantically ugly to have the parser effectively
> doing one deduction of this form when the main engine for that type
> of deduction is elsewhere; so unless there is a really good argument
> why we have to do this case (and NOT "it was pretty easy"), I don't
> want to do it.

Yeah, I'm not horribly attached to it. I began to structure the code to
support multiple kinds of checks, and at the end only two kinds were
reasonably doable and useful. We can remove it if no one is interested
in it, which appears to be the case.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Peter Eisentraut on 11 Jun 2010 08:07

On tis, 2010-06-08 at 10:05 -0400, Tom Lane wrote:
> Perhaps the correct fix would be to mark stored query trees as having
> a
> dependency on the index, so that dropping the index/constraint would
> force a drop of the rule too. Just pushing the check to plan time, as
> I suggested yesterday, isn't a very nice fix because it would result
> in the rule unexpectedly starting to fail at execution.

There are actually pretty explicit instructions about this in the SQL
standard:

<drop table constraint definition>

4) If QS is a <query specification> that contains an implicit or
explicit <group by clause> and that contains a column reference to a
column C in its <select list> that is not contained in an aggregated
argument of a <set function specification>, and if G is the set of
grouping columns of QS, and if the table constraint TC is needed to
conclude that G ↦ C is a known functional dependency in QS, then QS is
said to be dependent on TC.

5) If V is a view that contains a <query specification> that is
dependent on a table constraint TC, then V is said to be dependent on
TC.

So the dependency between the view/rule and the constraint/index needs
to be stored in the dependency system, and RESTRICT/CASCADE will take
effect.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Peter Eisentraut on 25 Jun 2010 16:06

On mån, 2010-06-07 at 21:33 +0300, Peter Eisentraut wrote:
> I have developed a patch that partially implements the "functional
> dependency" feature that allows some columns to be omitted from the
> GROUP BY clause if it can be shown that the columns are functionally
> dependent on the columns in the group by clause and therefore
> guaranteed to be unique per group.

Second version:

I stripped out all checks except the primary key/unique constraint
checks.

Views whose existence depends on one of those constraints get a
dependency recorded. This depends on the patch currently in the commit
fest to record not null constraints in pg_constraint, so that the
dependencies on not-null constraints can be recorded.

I haven't done any caching of index lookups yet. Some testing with
1600-column tables didn't show any effect. I'll test this a little
more.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: [HACKERS] Functional dependencies and GROUP BY
Next: [HACKERS] How to get permission to debug postgres?