Functional dependencies and GROUP BY [PgSql]

Prev: [HACKERS] Functional dependencies and GROUP BY
Next: [HACKERS] How to get permission to debug postgres?

From: Greg Stark on 7 Jun 2010 17:23

On Mon, Jun 7, 2010 at 7:33 PM, Peter Eisentraut <peter_e(a)gmx.net> wrote:
> I have developed a patch that partially implements the "functional
> dependency" feature

Nice! :)

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Hitoshi Harada on 7 Jun 2010 20:59

2010/6/8 Peter Eisentraut <peter_e(a)gmx.net>:
> I have developed a patch that partially implements the "functional
> dependency" feature that allows some columns to be omitted from the
> GROUP BY clause if it can be shown that the columns are functionally
> dependent on the columns in the group by clause and therefore guaranteed
> to be unique per group. �The full functional dependency deduction rules
> are pretty big and arcane, so I concentrated on getting a useful subset
> working. �In particular:

> Also, when a column is compared with a constant, it can appear
> ungrouped:
>
> SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x;

I don't see why it should be allowed. I see the insist that y must be
unique value so it is ok to be ungrouped but the point of discussion
is far from that; Semantically y is not grouping key.

In addition, what if y is implicitly a constant? For example,

SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;

or there should be more complicated example including JOIN cases. I
don't believe we can detect all of such cases. If the simple case is
allowed, users don't understand why the complicated case doesn't allow
sometimes. So it'll not be consistent.

Finally, it may hide unintended bugs. ORM tools may make WHERE clause
in some conditions and don't in other conditions.

Regards,

--
Hitoshi Harada

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Stephen Frost on 7 Jun 2010 21:38

* Hitoshi Harada (umi.tanuki(a)gmail.com) wrote:
> I don't see why it should be allowed. I see the insist that y must be
> unique value so it is ok to be ungrouped but the point of discussion
> is far from that; Semantically y is not grouping key.

Ignoring the fact that it's terribly useful- isn't it part of the SQL
spec?

> In addition, what if y is implicitly a constant? For example,
>
> SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;

Not sure I see the issue here?

> Finally, it may hide unintended bugs. ORM tools may make WHERE clause
> in some conditions and don't in other conditions.

Yeah, this one I really just done buy.. If an ORM tool doesn't write
correct SQL, then it's the ORM's fault, not ours.

Thanks,

Stephen

From: Stephen Frost on 7 Jun 2010 21:41

* Peter Eisentraut (peter_e(a)gmx.net) wrote:
> This is frequently requested by MySQL converts (and possibly others).

I'd certainly love to see it- but let's not confuse people by implying
that it would actually act the way MySQL does. It wouldn't, because
what MySQL does is alot closer to 'distinct on' and is patently insane
to boot. Again, I'd *love* to see this be done in PG, but when we
document it and tell people about it, *please* don't say it's similar in
any way to MySQL's "oh, we'll just pick a random value from the columns
that aren't group'd on" implementation.

> At this point, this patch could use some review and testing with unusual
> queries that break my implementation. ;-)

I'll give it a shot... :)

Thanks!

Stephen

From: Tom Lane on 7 Jun 2010 23:16

Peter Eisentraut <peter_e(a)gmx.net> writes:
> I have developed a patch that partially implements the "functional
> dependency" feature that allows some columns to be omitted from the
> GROUP BY clause if it can be shown that the columns are functionally
> dependent on the columns in the group by clause and therefore guaranteed
> to be unique per group.

The main objection to this is the same one I've had all along: it makes
the syntactic validity of a query dependent on what indexes exist for
the table. At minimum, that means that enforcing the check at parse
time is the Wrong Thing.

The var-compared-with-constant case seems like a big crock. Are we
really required to provide such a thing per spec?

I'm also fairly concerned about the performance of a check implemented
this way --- it's going to do a lot of work, and do it over and over
again as it traverses the query tree. At least some of that could be
alleviated after you move the check to the planner, just by virtue of
the index information already having been acquired ... but I'd still
suggest expending more than no effort on caching the results. For
instance, given "SELECT * FROM a_very_wide_table GROUP BY pk" you
shouldn't have to prove more than once that a_very_wide_table is
grouped by its PK.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2 3 4 5 6
Prev: [HACKERS] Functional dependencies and GROUP BY
Next: [HACKERS] How to get permission to debug postgres?