Aggregate ORDER BY patch [PgSql]

Prev: Python 3.1 support
Next: [HACKERS] DTrace compiler warnings

From: Tom Lane on 14 Dec 2009 13:06

Andrew Gierth <andrew(a)tao11.riddles.org.uk> writes:
> Updated version of the aggregate order by patch.

I'm starting to look at this now. I find it rather bizarre to merge
both the actual arguments of an aggregate and the optional ORDER BY
expressions into a single targetlist. It doesn't seem like that would
be an especially convenient representation to work with, and I would
also expect there to be a nonzero performance hit from the extra
TargetEntry expression nodes, even when the feature is not in use.
Why didn't you use separate lists?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andrew Gierth on 14 Dec 2009 13:45

>>>>> "Tom" == Tom Lane <tgl(a)sss.pgh.pa.us> writes:

>> Updated version of the aggregate order by patch.

Tom> I'm starting to look at this now. I find it rather bizarre to
Tom> merge both the actual arguments of an aggregate and the optional
Tom> ORDER BY expressions into a single targetlist. It doesn't seem
Tom> like that would be an especially convenient representation to
Tom> work with,

It's extremely convenient, since you need the arguments and the
ordering expressions together in a slot in order to feed them to
tuplesort (in the general case where there is more than one
expression); you need a tupledesc to pass to tuplesort; and there are
existing functions to construct all of these things from the tlist.
Also, you want to merge the argument expressions and ordering
expressions where possible, and this is exactly what the existing
transformSortClause stuff expects.

Tom> and I would also expect there to be a nonzero performance hit
Tom> from the extra TargetEntry expression nodes, even when the
Tom> feature is not in use.

I tested for that and couldn't reliably detect any (certainly <2%
on count(i) on generated data, and under the circumstances I was
testing that's not necessarily outside the error margin).

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 14 Dec 2009 15:08

Andrew Gierth <andrew(a)tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl(a)sss.pgh.pa.us> writes:
> Tom> and I would also expect there to be a nonzero performance hit
> Tom> from the extra TargetEntry expression nodes, even when the
> Tom> feature is not in use.

> I tested for that and couldn't reliably detect any (certainly <2%
> on count(i) on generated data, and under the circumstances I was
> testing that's not necessarily outside the error margin).

Hm. Now that I look closer I see we already have a hack to avoid
executing an extra expression node when a targetlist item is evaluated,
so the extra cost should indeed be insignificant. There would be
nonzero associated costs during planning and executor startup, but we
seldom bother optimizing for that. So nevermind...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 15 Dec 2009 13:07

Andrew Gierth <andrew(a)tao11.riddles.org.uk> writes:
> Updated version of the aggregate order by patch.

Applied with some editorialization. The main change I made was to get
rid of all the ad-hoc DISTINCT handling in parse_agg.c and use
transformDistinctClause() instead. This exposed what I believe to
be a bug in the submitted patch: it accepted cases like

agg(DISTINCT x ORDER BY x,y)

We do not allow that in ordinary query-level DISTINCT because it's
ambiguous --- there might be multiple y values for any particular value
of x, so the ordering is uncertain. The only way it's not uncertain is
if the sort by x fully determines the order, in which case listing y is
merely useless. I think this is something that was changed not too
long ago, so maybe you were trying to emulate the old behavior, but
in any case it's better to not have extra code that doesn't behave
quite like the normal case.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andrew Gierth on 15 Dec 2009 15:31

>>>>> "Tom" == Tom Lane <tgl(a)sss.pgh.pa.us> writes:

Tom> Andrew Gierth <andrew(a)tao11.riddles.org.uk> writes:
>> Updated version of the aggregate order by patch.

Tom> Applied with some editorialization. The main change I made was
Tom> to get rid of all the ad-hoc DISTINCT handling in parse_agg.c
Tom> and use transformDistinctClause() instead.

I'll review that; I avoided that code intentionally because the
semantics of query-level DISTINCT are different enough.

Tom> This exposed what I believe to be a bug in the submitted patch:
Tom> it accepted cases like

Tom> agg(DISTINCT x ORDER BY x,y)

This is not a bug, it was done intentionally (as you might have
guessed from the fact that there was a regression test for it). The
additional ORDER BY column in this case is always safe (since DISTINCT
adopts the equality operator from the sort, it's not possible for
additional sort columns to break the DISTINCT). I allowed the case
since there was therefore no good reason to forbid it.

There is at least one case where this makes a visible difference in
query output: if the aggregate can distinguish values of x which are
considered equal by the sort operator used, then the value of y
affects which value of x is seen. It is probably relatively easy to
generate examples of this using the box type and array_agg.

Tom> We do not allow that in ordinary query-level DISTINCT

Note that ordinary query-level DISTINCT has the reverse semantics; the
DISTINCT operation is (per spec) logically prior to the order by, the
fact that they are planned in the reverse order is an implementation
detail.

Query-level DISTINCT shouldn't allow columns in the order by that
aren't in the select list because those columns _do not exist_ at the
point that ordering logically takes place (even though in the
implementation, they might).

This isn't the case for aggregate order by.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: Python 3.1 support
Next: [HACKERS] DTrace compiler warnings