Aggregate ORDER BY patch [PgSql]

Prev: Python 3.1 support
Next: [HACKERS] DTrace compiler warnings

From: Tom Lane on 15 Dec 2009 15:48

Andrew Gierth <andrew(a)tao11.riddles.org.uk> writes:
> Query-level DISTINCT shouldn't allow columns in the order by that
> aren't in the select list because those columns _do not exist_ at the
> point that ordering logically takes place (even though in the
> implementation, they might).

> This isn't the case for aggregate order by.

I entirely disagree. Why should the semantics of this combination of
ORDER BY and DISTINCT be different from what they are at the query
top level? We made other decisions about this feature on the basis
of making the two cases work alike, and I don't think you've made an
adequate argument for making them act differently.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andrew Gierth on 15 Dec 2009 16:01

>>>>> "Tom" == Tom Lane <tgl(a)sss.pgh.pa.us> writes:

>> Query-level DISTINCT shouldn't allow columns in the order by that
>> aren't in the select list because those columns _do not exist_ at
>> the point that ordering logically takes place (even though in the
>> implementation, they might).

>> This isn't the case for aggregate order by.

Tom> I entirely disagree. Why should the semantics of this
Tom> combination of ORDER BY and DISTINCT be different from what they
Tom> are at the query top level? We made other decisions about this
Tom> feature on the basis of making the two cases work alike, and I
Tom> don't think you've made an adequate argument for making them act
Tom> differently.

A case could possibly be made that the behaviour of DISTINCT at top
level is wrong, or at least less useful than need be.

Notice that there are cases where agg(distinct x order by x) is
nondeterministic while agg(distinct x order by x,y) is deterministic.
In my view that alone is a good argument for allowing it.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 15 Dec 2009 16:10

Andrew Gierth <andrew(a)tao11.riddles.org.uk> writes:
> Notice that there are cases where agg(distinct x order by x) is
> nondeterministic while agg(distinct x order by x,y) is deterministic.

Well, I think what you're really describing is a case where you're using
the wrong sort opclass. If the aggregate can distinguish two values of
x, and the sort operator can't, use another sort operator that can.

If we really wanted to take the above seriously, my opinion is that
we ought to introduce DISTINCT ON in aggregates. However, at that
point you lose the argument of standard syntax, so it's not real
clear why you shouldn't just fall back on
select agg(x) from (select distinct on (x) x ... order by x,y)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Marko Tiikkaja on 19 Dec 2009 14:13

On 2009-12-15 23:10 +0200, Tom Lane wrote:
> Andrew Gierth<andrew(a)tao11.riddles.org.uk> writes:
>> Notice that there are cases where agg(distinct x order by x) is
>> nondeterministic while agg(distinct x order by x,y) is deterministic.
>
> Well, I think what you're really describing is a case where you're using
> the wrong sort opclass. If the aggregate can distinguish two values of
> x, and the sort operator can't, use another sort operator that can.
>
> If we really wanted to take the above seriously, my opinion is that
> we ought to introduce DISTINCT ON in aggregates. However, at that
> point you lose the argument of standard syntax, so it's not real
> clear why you shouldn't just fall back on
> select agg(x) from (select distinct on (x) x ... order by x,y)

FWIW, in my opinion the idea behind this patch is to not fall back on
hacks like that. This patch already goes beyond the standard and having
this seems like a useful feature in some cases. Although the DISTINCT
ON syntax would have a bit more resemblance on the existing syntax, I'd
still like to see agg(distinct x order by x,y).

Just my $0.02.

Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 19 Dec 2009 14:34

Marko Tiikkaja <marko.tiikkaja(a)cs.helsinki.fi> writes:
> On 2009-12-15 23:10 +0200, Tom Lane wrote:
>> If we really wanted to take the above seriously, my opinion is that
>> we ought to introduce DISTINCT ON in aggregates.

> FWIW, in my opinion the idea behind this patch is to not fall back on
> hacks like that. This patch already goes beyond the standard and having
> this seems like a useful feature in some cases. Although the DISTINCT
> ON syntax would have a bit more resemblance on the existing syntax, I'd
> still like to see agg(distinct x order by x,y).

I remain entirely unconvinced. If DISTINCT + ORDER BY work differently
inside aggregates than at query level, we're going to forever be
explaining the difference, fielding bug reports, etc. Even documenting
the difference would be a serious PITA considering how subtle it is
(AFAICS Andrew's submitted doc patch failed to address the point).

I'm not against the idea of introducing DISTINCT ON here, though I think
perhaps we ought to wait for a release or so and see if there's really
any field demand for it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: Python 3.1 support
Next: [HACKERS] DTrace compiler warnings