From: Tom Lane on 15 Dec 2009 15:48 Andrew Gierth <andrew(a)tao11.riddles.org.uk> writes: > Query-level DISTINCT shouldn't allow columns in the order by that > aren't in the select list because those columns _do not exist_ at the > point that ordering logically takes place (even though in the > implementation, they might). > This isn't the case for aggregate order by. I entirely disagree. Why should the semantics of this combination of ORDER BY and DISTINCT be different from what they are at the query top level? We made other decisions about this feature on the basis of making the two cases work alike, and I don't think you've made an adequate argument for making them act differently. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Andrew Gierth on 15 Dec 2009 16:01 >>>>> "Tom" == Tom Lane <tgl(a)sss.pgh.pa.us> writes: >> Query-level DISTINCT shouldn't allow columns in the order by that >> aren't in the select list because those columns _do not exist_ at >> the point that ordering logically takes place (even though in the >> implementation, they might). >> This isn't the case for aggregate order by. Tom> I entirely disagree. Why should the semantics of this Tom> combination of ORDER BY and DISTINCT be different from what they Tom> are at the query top level? We made other decisions about this Tom> feature on the basis of making the two cases work alike, and I Tom> don't think you've made an adequate argument for making them act Tom> differently. A case could possibly be made that the behaviour of DISTINCT at top level is wrong, or at least less useful than need be. Notice that there are cases where agg(distinct x order by x) is nondeterministic while agg(distinct x order by x,y) is deterministic. In my view that alone is a good argument for allowing it. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 15 Dec 2009 16:10 Andrew Gierth <andrew(a)tao11.riddles.org.uk> writes: > Notice that there are cases where agg(distinct x order by x) is > nondeterministic while agg(distinct x order by x,y) is deterministic. Well, I think what you're really describing is a case where you're using the wrong sort opclass. If the aggregate can distinguish two values of x, and the sort operator can't, use another sort operator that can. If we really wanted to take the above seriously, my opinion is that we ought to introduce DISTINCT ON in aggregates. However, at that point you lose the argument of standard syntax, so it's not real clear why you shouldn't just fall back on select agg(x) from (select distinct on (x) x ... order by x,y) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Marko Tiikkaja on 19 Dec 2009 14:13 On 2009-12-15 23:10 +0200, Tom Lane wrote: > Andrew Gierth<andrew(a)tao11.riddles.org.uk> writes: >> Notice that there are cases where agg(distinct x order by x) is >> nondeterministic while agg(distinct x order by x,y) is deterministic. > > Well, I think what you're really describing is a case where you're using > the wrong sort opclass. If the aggregate can distinguish two values of > x, and the sort operator can't, use another sort operator that can. > > If we really wanted to take the above seriously, my opinion is that > we ought to introduce DISTINCT ON in aggregates. However, at that > point you lose the argument of standard syntax, so it's not real > clear why you shouldn't just fall back on > select agg(x) from (select distinct on (x) x ... order by x,y) FWIW, in my opinion the idea behind this patch is to not fall back on hacks like that. This patch already goes beyond the standard and having this seems like a useful feature in some cases. Although the DISTINCT ON syntax would have a bit more resemblance on the existing syntax, I'd still like to see agg(distinct x order by x,y). Just my $0.02. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 19 Dec 2009 14:34
Marko Tiikkaja <marko.tiikkaja(a)cs.helsinki.fi> writes: > On 2009-12-15 23:10 +0200, Tom Lane wrote: >> If we really wanted to take the above seriously, my opinion is that >> we ought to introduce DISTINCT ON in aggregates. > FWIW, in my opinion the idea behind this patch is to not fall back on > hacks like that. This patch already goes beyond the standard and having > this seems like a useful feature in some cases. Although the DISTINCT > ON syntax would have a bit more resemblance on the existing syntax, I'd > still like to see agg(distinct x order by x,y). I remain entirely unconvinced. If DISTINCT + ORDER BY work differently inside aggregates than at query level, we're going to forever be explaining the difference, fielding bug reports, etc. Even documenting the difference would be a serious PITA considering how subtle it is (AFAICS Andrew's submitted doc patch failed to address the point). I'm not against the idea of introducing DISTINCT ON here, though I think perhaps we ought to wait for a release or so and see if there's really any field demand for it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |