Functional dependencies and GROUP BY [PgSql]

Prev: [HACKERS] Functional dependencies and GROUP BY
Next: [HACKERS] How to get permission to debug postgres?

From: Pavel Stehule on 8 Jun 2010 02:56

2010/6/7 Greg Stark <gsstark(a)mit.edu>:
> On Mon, Jun 7, 2010 at 7:33 PM, Peter Eisentraut <peter_e(a)gmx.net> wrote:
>> I have developed a patch that partially implements the "functional
>> dependency" feature
>
> Nice! :)
>

I like this idea too. It can simplify some queries and I believe - it
is very good marketing bonus for us.

Pavel
> --
> greg
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Rob Wultsch on 8 Jun 2010 05:50

On Mon, Jun 7, 2010 at 6:41 PM, Stephen Frost <sfrost(a)snowman.net> wrote:
> * Peter Eisentraut (peter_e(a)gmx.net) wrote:
>> This is frequently requested by MySQL converts (and possibly others).
>
> I'd certainly love to see it- but let's not confuse people by implying
> that it would actually act the way MySQL does. �It wouldn't, because
> what MySQL does is alot closer to 'distinct on' and is patently insane
> to boot. �Again, I'd *love* to see this be done in PG, but when we
> document it and tell people about it, *please* don't say it's similar in
> any way to MySQL's "oh, we'll just pick a random value from the columns
> that aren't group'd on" implementation.

Preface: I work as a MySQL DBA (yeah, yeah, laugh it up...).

It has been my experience that the vast majority of the time when a
MySQL users make use of the "fine feature" which allows them to use
unaggregated columns which is not present in the GROUP BY clause in an
aggregating query they have made an error because they do not
understand GROUP BY. I have found this lack of understanding to be
very wide spread across the MySQL developer and *DBA* communities. I
also would really hesitate to compare this useful feature to the *fine
feature* present in MySQL. Due to a long standing bug
(http://bugs.mysql.com/bug.php?id=8510) it really is not possible to
get MySQL to behave sanely. It is my impression that many programs of
significant size that interact with MySQL have errors because of this
issue and it would be good to not claim to have made Postgres
compatible.

That said, I imagine if this feature could make it into the Postgres
tree it would be very useful.

Would I be correct in assuming that while this feature would make
query planning more expensive, it would also often decrease the cost
of execution?

Best,

Rob Wultsch
wultsch(a)gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Peter Eisentraut on 8 Jun 2010 05:32

On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote:
> > Also, when a column is compared with a constant, it can appear
> > ungrouped:
> >
> > SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x;
>
> I don't see why it should be allowed. I see the insist that y must be
> unique value so it is ok to be ungrouped but the point of discussion
> is far from that; Semantically y is not grouping key.

I'm not sure what your argument is. If y is uniquely determined within
each group, then it's OK for it to be ungrouped. What other criteria do
you have in mind for determining that instead? It looks like you are
going by aesthetics. ;-)

> In addition, what if y is implicitly a constant? For example,
>
> SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;
>
> or there should be more complicated example including JOIN cases. I
> don't believe we can detect all of such cases. If the simple case is
> allowed, users don't understand why the complicated case doesn't allow
> sometimes. So it'll not be consistent.

Yes, as I said, my implementation is incomplete in the sense that it
only recognizes some functional dependencies. To recognize the sort of
thing you show, you would need some kind of complex deduction or proof
engine, and that doesn't seem worthwhile, at least for me, at this
point.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on 8 Jun 2010 08:40

On Tue, Jun 8, 2010 at 4:16 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Peter Eisentraut <peter_e(a)gmx.net> writes:
>> I have developed a patch that partially implements the "functional
>> dependency" feature that allows some columns to be omitted from the
>> GROUP BY clause if it can be shown that the columns are functionally
>> dependent on the columns in the group by clause and therefore guaranteed
>> to be unique per group.
>
> The main objection to this is the same one I've had all along: it makes
> the syntactic validity of a query dependent on what indexes exist for
> the table. �At minimum, that means that enforcing the check at parse
> time is the Wrong Thing.

It also needs to ensure that the plan is invalidated if the constraint
is dropped, which I assume amounts to the same thing.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 8 Jun 2010 10:05

Greg Stark <gsstark(a)mit.edu> writes:
> On Tue, Jun 8, 2010 at 4:16 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> The main objection to this is the same one I've had all along: it makes
>> the syntactic validity of a query dependent on what indexes exist for
>> the table. �At minimum, that means that enforcing the check at parse
>> time is the Wrong Thing.

> It also needs to ensure that the plan is invalidated if the constraint
> is dropped, which I assume amounts to the same thing.

Well, no, any cached plan will get invalidated if the index goes away.
The big problem with this implementation is that you could create a
*rule* (eg a view) containing a query whose validity depends on the
existence of an index. Dropping the index will not cause the rule
to be invalidated.

Perhaps the correct fix would be to mark stored query trees as having a
dependency on the index, so that dropping the index/constraint would
force a drop of the rule too. Just pushing the check to plan time, as
I suggested yesterday, isn't a very nice fix because it would result
in the rule unexpectedly starting to fail at execution.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: [HACKERS] Functional dependencies and GROUP BY
Next: [HACKERS] How to get permission to debug postgres?