Aggregate ORDER BY patch [PgSql]

Prev: Python 3.1 support
Next: [HACKERS] DTrace compiler warnings

From: Hitoshi Harada on 15 Nov 2009 20:40

2009/11/16 Andrew Gierth <andrew(a)tao11.riddles.org.uk>:
>>>>>> "Hitoshi" == Hitoshi Harada <umi.tanuki(a)gmail.com> writes:
>
> Hitoshi> Questions here:
> Hitoshi> - agglevelsup?
> What case exactly would you consider an error? When an order by
> expression references a lower (more deeply nested) query level than
> any of the actual arguments?

It's only that I felt not intuitive. To me, arguments are regarded as
aggregate's "member" while ORDER BY clause expressions didn't hit me.
If it's only me, no problem. Maybe additional document in
#syntax-aggregates will do.

> Hitoshi> - order by 1?
>
> Hitoshi> Normal ORDER BY clause accepts constant integer as
> Hitoshi> TargetEntry's resno. The patch seems not to support it.
> Hitoshi> Shouldn't it be the same as normal ORDER BY?
>
> Specifically documented. The SQL spec doesn't allow ordinal positions
> in ORDER BY any more (those are a holdover from SQL92) and we don't
> support them in, for example, window ORDER BY clauses.

Clear.

> Hitoshi> Coding, almost all sane. Since its syntax and semantics are
> Hitoshi> similar to existing DISTINCT and ORDER BY features, parsing
> Hitoshi> and transformation code are derived from those area. The
> Hitoshi> executor has few issues:
>
> Hitoshi> - #include in nodeAgg.c
> Hitoshi> executor/tuptable.h is added in the patch but required really?
> Hitoshi> I removed that line but still build without any warnings.
>
> The code is making explicit use of various Slot calls declared in
> tuptable.h. The only reason why it builds without error when you
> remove that is that utils/tuplesort.h happens to include tuptable.h
> indirectly.
>
> The code is making explicit use of various Slot calls declared in
> tuptable.h. The only reason why it builds without error when you
> remove that is that utils/tuplesort.h happens to include tuptable.h
> indirectly.

My C skill is not so good to determine if that #include is needed.
Simply we'd not needed it, it seems to me that it isn't needed still.

> Hitoshi> - process_ordered_aggregate_(single|multi)
> Hitoshi> It seems that the patch left process_sorted_aggregate()
> Hitoshi> function as process_ordered_aggregate_single() and added
> Hitoshi> process_ordered_aggregate_multi() for more than one input
> Hitoshi> arguments (actually target entries) case. Why have those
> Hitoshi> two? Could we combine them? Or I couldn't find convincing
> Hitoshi> reason in comments.
>
> Performance.
>
> tuplesort_getdatum etc. seems to be substantially faster than
> tuplesort_gettupleslot especially for the case where you're sorting a
> pass-by-value datum such as an integer (since the datum is then stored
> only in the sort tuple header and doesn't require a separate space
> allocation for itself). Using a slot in all cases would have slowed
> down some common cases like count(distinct id) by a measurable amount.
>
> Cases like array_agg(x order by x) benefit from the faster code path
> too.
>
> The memory management between the two cases is sufficiently different
> that combining them into one function while still maintaining the
> slot vs. datum distinction would be ugly and probably error-prone.
> The relatively minor duplication of logic seemed much clearer to me.

I see the reason. But if we allow this, shouldn't we apply the same
logic to other sort depend parts? Or maybe refactor tuplesort in the
future?

>
> Hitoshi> - SortGroupClause.implicit
> Hitoshi> "implicit" member was added in SortGroupClause. I didn't
> Hitoshi> find clear reason to add this. Could you show a case to
> Hitoshi> clarify this?
>
> Without that flag or something like it, when you do
>
> create view foo as select count(distinct x) from table;
>
> and then display the view definition, you would get back the query as
> "select count(distinct x order by x) from table" which would be
> confusing and unnecessarily backwards- and forwards-incompatible.
>
> So the code sets "implicit" for any SortGroupClause that is added for
> some internal reason rather than being present in the original query,
> and the deparse code in ruleutils skips such clauses.

I don't have much experiences in VIEW systems, but isn't it enough to
let "order by x" omitted? "select count(distinct x order by x) from
table" means the same as "select count(distinct x) from table"
currently. ruleutils can handle it if distinct expressions are equal
to order by expressions.

Regards,

--
Hitoshi Harada

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 15 Nov 2009 20:52

Hitoshi Harada <umi.tanuki(a)gmail.com> writes:
> 2009/11/16 Andrew Gierth <andrew(a)tao11.riddles.org.uk>:
> "Hitoshi" == Hitoshi Harada <umi.tanuki(a)gmail.com> writes:
>>> �Hitoshi> �- SortGroupClause.implicit
>>> �Hitoshi> "implicit" member was added in SortGroupClause. I didn't
>>> �Hitoshi> find clear reason to add this. Could you show a case to
>>> �Hitoshi> clarify this?
>>
>> So the code sets "implicit" for any SortGroupClause that is added for
>> some internal reason rather than being present in the original query,
>> and the deparse code in ruleutils skips such clauses.

> I don't have much experiences in VIEW systems, but isn't it enough to
> let "order by x" omitted?

I agree with Hitoshi that this seems to be a hack to deal with the
consequences of a bad design decision. We just recently cleaned up
an ancient mistake of the same sort (having the parser add clauses
to ORDER BY/DISTINCT that the user didn't write) and I don't care
to repeat that error in aggregates.

If it's necessary to decorate the aggregate with additional clauses in
order to make the executor work, the proper place to do that is the
planner. The reason this division of labor is worth preserving is
that if there are alternative implementations that might reasonably be
chosen, the planner is the place to make that choice. Nailing down
sort order in the parser is pre-judging something the planner might
wish to change. (As an example, the reason we had to fix the ORDER
BY/DISTINCT mistake was that it was constraining the planner's choices
about substituting hash aggregation for sort/unique.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andrew Gierth on 15 Nov 2009 21:29

>>>>> "Hitoshi" == Hitoshi Harada <umi.tanuki(a)gmail.com> writes:

>> What case exactly would you consider an error? When an order by
>> expression references a lower (more deeply nested) query level
>> than any of the actual arguments?

Hitoshi> It's only that I felt not intuitive. To me, arguments are
Hitoshi> regarded as aggregate's "member" while ORDER BY clause
Hitoshi> expressions didn't hit me. If it's only me, no
Hitoshi> problem. Maybe additional document in #syntax-aggregates
Hitoshi> will do.

How about:

... But an exception occurs if the aggregate's arguments
(including any <literal>ORDER BY</> clause) contain only
outer-level variables: the aggregate then belongs to the nearest
such outer level, ...

>> Without that flag or something like it, when you do
>>
>> create view foo as select count(distinct x) from table;
>>
>> and then display the view definition, you would get back the query as
>> "select count(distinct x order by x) from table" which would be
>> confusing and unnecessarily backwards- and forwards-incompatible.
>>
>> So the code sets "implicit" for any SortGroupClause that is added for
>> some internal reason rather than being present in the original query,
>> and the deparse code in ruleutils skips such clauses.

Hitoshi> I don't have much experiences in VIEW systems, but isn't it
Hitoshi> enough to let "order by x" omitted? "select count(distinct x
Hitoshi> order by x) from table" means the same as "select
Hitoshi> count(distinct x) from table" currently. ruleutils can
Hitoshi> handle it if distinct expressions are equal to order by
Hitoshi> expressions.

That doesn't work in more complex cases. For example, the user might
specify aggfunc(distinct x,y order by x) (not caring about the relative
order of y) but the code will still turn that internally into
aggfunc(distinct x,y order by x,y). It's necessary to be able to recover
what the user originally entered, which means needing to be able to
distinguish both of those cases from aggfunc(distinct x,y).

Other ways this could have been done (but which I rejected) were:

1) separate lists of SortGroupClauses for "orderings the user
explicitly required" and "ordering we're going to use"

2) single list of SortGroupClauses, but a count of how many of the
entries are original, rather than added

3) deferring the addition of extra ordering elements required for
distinctness until planning time

I didn't consider (3) because I wasn't really touching the planner
for this patch, and the information needed was already available in
parse analysis as part of the error checking.

--
Andrew.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andrew Gierth on 15 Nov 2009 22:25

>>>>> "Tom" == Tom Lane <tgl(a)sss.pgh.pa.us> writes:

Tom> I agree with Hitoshi that this seems to be a hack to deal with the
[snip]

So copying the way that SELECT DISTINCT works would be the way to go?
i.e. keeping separate lists in the parse node for sorting and distinct?

What about error handling? If the user specifies agg(distinct x) where
x is not sortable, do we leave it to the planner to detect that (which
means not reporting the error position?)

--
Andrew.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 15 Nov 2009 22:31

Andrew Gierth <andrew(a)tao11.riddles.org.uk> writes:
> What about error handling? If the user specifies agg(distinct x) where
> x is not sortable, do we leave it to the planner to detect that (which
> means not reporting the error position?)

Well, at the moment there's only going to be a sort-based
implementation, so I don't object to throwing an error for that
as soon as possible. OTOH I wouldn't recommend expending a lot
of code to do it there. I would hope that most of the parser's
work for this can be shared with the existing support for query-level
ORDER BY/DISTINCT. If that means that we don't complain immediately
about cases where there is hash but not sort support, that seems all
right to me, because there are very few such datatypes anyway.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: Python 3.1 support
Next: [HACKERS] DTrace compiler warnings