Functional dependencies and GROUP BY [PgSql]

Prev: [HACKERS] Functional dependencies and GROUP BY
Next: [HACKERS] How to get permission to debug postgres?

From: Greg Stark on 8 Jun 2010 10:11

On Tue, Jun 8, 2010 at 3:05 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Well, no, any cached plan will get invalidated if the index goes away.
> The big problem with this implementation is that you could create a
> *rule* (eg a view) containing a query whose validity depends on the
> existence of an index. �Dropping the index will not cause the rule
> to be invalidated.

Hm, I was incorrectly thinking of this as analogous to the cases of
plans that could be optimized based on the existence of a constraint.
For example removing columns from a sort key because they're unique.
But this is different because not just the plan but the validity of
the query itself is dependent on the constraint.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 8 Jun 2010 10:21

Peter Eisentraut <peter_e(a)gmx.net> writes:
> On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote:
>> In addition, what if y is implicitly a constant? For example,
>>
>> SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;

> Yes, as I said, my implementation is incomplete in the sense that it
> only recognizes some functional dependencies. To recognize the sort of
> thing you show, you would need some kind of complex deduction or proof
> engine, and that doesn't seem worthwhile, at least for me, at this
> point.

The question is why bother to recognize *any* cases of this form.
I find it really semantically ugly to have the parser effectively
doing one deduction of this form when the main engine for that type
of deduction is elsewhere; so unless there is a really good argument
why we have to do this case (and NOT "it was pretty easy"), I don't
want to do it.

As far as I recall, at least 99% of the user requests for this type
of behavior, maybe 100%, would be satisfied by recognizing the
group-by-primary-key case. So I think we should do that and be happy.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Marko Tiikkaja on 8 Jun 2010 10:48

On 6/8/10 5:21 PM +0300, Tom Lane wrote:
> Peter Eisentraut<peter_e(a)gmx.net> writes:
>> On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote:
>>> In addition, what if y is implicitly a constant? For example,
>>>
>>> SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;
>
>> Yes, as I said, my implementation is incomplete in the sense that it
>> only recognizes some functional dependencies. To recognize the sort of
>> thing you show, you would need some kind of complex deduction or proof
>> engine, and that doesn't seem worthwhile, at least for me, at this
>> point.
>
> The question is why bother to recognize *any* cases of this form.
> I find it really semantically ugly to have the parser effectively
> doing one deduction of this form when the main engine for that type
> of deduction is elsewhere; so unless there is a really good argument
> why we have to do this case (and NOT "it was pretty easy"), I don't
> want to do it.
>
> As far as I recall, at least 99% of the user requests for this type
> of behavior, maybe 100%, would be satisfied by recognizing the
> group-by-primary-key case. So I think we should do that and be happy.

+1

Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Stephen Frost on 8 Jun 2010 11:16

* Tom Lane (tgl(a)sss.pgh.pa.us) wrote:
> Perhaps the correct fix would be to mark stored query trees as having a
> dependency on the index, so that dropping the index/constraint would
> force a drop of the rule too. Just pushing the check to plan time, as
> I suggested yesterday, isn't a very nice fix because it would result
> in the rule unexpectedly starting to fail at execution.

Alternatively, we could rewrite the rule (not unlike what we do for
"SELECT *") to actually add on the other implicitly grouped-by columns..
I don't know if that's better or worse than creating a dependency,
since if the constraint were dropped/changed, people might expect the
rule's output to change. Of course, as you mention, the alternative
would really be for the rule to just start failing.. Still, if I wanted
to change the constraint, it'd be alot nicer to just be able to change
it and, presuming I'm just adding a column to it or doing some other
change which wouldn't invalidate the rule, not have to drop/recreate
the rule.

Thanks,

Stephen

From: Greg Stark on 8 Jun 2010 11:22

On Tue, Jun 8, 2010 at 3:21 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:

> The question is why bother to recognize *any* cases of this form.
> I find it really semantically ugly to have the parser effectively
> doing one deduction of this form when the main engine for that type
> of deduction is elsewhere; so unless there is a really good argument
> why we have to do this case (and NOT "it was pretty easy"), I don't
> want to do it.

Well it does appear to be there:

4.18.11 Known functional dependencies in the result of a <where clause>
....
If AP is an equality AND-component of the <search condition> simply
contained in the <where clause> and one comparand of AP is a column
reference CR, and the other comparand of AP is a <literal>, then let
CRC be the counterpart of CR in R. {} {CRC} is a known functional
dependency in R, where {} denotes the empty set.

NOTE 43 � An SQL-implementation may also choose to recognize {}
{CRC} as a known functional dependency if the other comparand is a
deterministic expression containing no column references.
....

Since Peter's not eager to implement the whole section -- which does
seem pretty baroque -- it's up to us to draw the line where we stop
coding and declare it good enough. I think we're all agreed that
grouping by a pk is clearly the most important case. It may be
important to get some other cases just so that the PK property carries
through other clauses such as joins and group bys. But ultimately the
only thing stopping us from implementing the whole thing is our
threshold of pain for writing and maintaining the extra code.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: [HACKERS] Functional dependencies and GROUP BY
Next: [HACKERS] How to get permission to debug postgres?