From: Greg Stark on 13 Nov 2009 14:01 On Fri, Nov 13, 2009 at 5:09 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > Quite. This is another instance of the thing I complained of before, > that the SQL committee likes to define the behavior of specific > aggregates instead of inducing a generic aggregate-behavior definition. I think this makes sense from the point of view of the spec authors. They're trying to standardize the behaviour of the functions that their existing implementations provide without creating extra demands on themselves to implement new features. Even if some of them do implement more general solutions the path of least resistance to getting their syntax standardized will be the one which imposes the least cost on the other members of the committee. > So we're on our own to extract one, and this proposal seems pretty > reasonable to me: it's useful and it's consistent with the query-level > behavior of DISTINCT and ORDER BY. ++ -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Hitoshi Harada on 15 Nov 2009 06:42 Here's my first review. The patch was in context diff format and applied cleanly with a little 3 hunks in parse_expr.c. make succeeded without any warnings and make check passed all 121 tests. It implements as advertised, following SQL spec with extension of both DISTINCT clause and ORDER BY clause are available in any aggregate functions including user defined ones. It supports VIEWs by adding code in ruleutils.c. Questions here: - agglevelsup? We have aggregate capability that all arguments from upper level query in downer level aggregate makes aggregate call itself to upper level call, as a constant value in downer level. What if ORDER BY clause has downer level Vars? For example: regression=# select (select count(t1.four order by unique1) from tenk1 limit 1) from tenk1 t1 where unique1 < 10; ?column? ---------- 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 (10 rows) regression=# select (select count(t1.four order by t1.unique1) from tenk1 limit 1) from tenk1 t1 where unique1 < 10; ?column? ---------- 10 (1 row) Is it sane? The result is consistent but surprised me a little. No need to raise an error? - order by 1? Normal ORDER BY clause accepts constant integer as TargetEntry's resno. The patch seems not to support it. regression=# select array_agg(four order by 1) from tenk1 where unique1 < 10; array_agg ----------------------- {0,2,1,2,1,0,1,3,3,0} (1 row) Shouldn't it be the same as normal ORDER BY? Performance doesn't seem slowing down, though I don't have quantitative test result. Coding, almost all sane. Since its syntax and semantics are similar to existing DISTINCT and ORDER BY features, parsing and transformation code are derived from those area. The executor has few issues: - #include in nodeAgg.c executor/tuptable.h is added in the patch but required really? I removed that line but still build without any warnings. - process_ordered_aggregate_(single|multi) It seems that the patch left process_sorted_aggregate() function as process_ordered_aggregate_single() and added process_ordered_aggregate_multi() for more than one input arguments (actually target entries) case. Why have those two? Could we combine them? Or I couldn't find convincing reason in comments. And ParseFuncOrColumn() in parse_func.c now gets more complicated. Since feature / semantics are similar, I bet we may share some code to transform DISTINCT and ORDER BY with traditional code in parse_clause.c, though I'm not sure nor don't have clear idea. Especially, code around here save_next_resno = pstate->p_next_resno; pstate->p_next_resno = attno + 1; cheats pstate to transform clauses and I felt a bit fear. - SortGroupClause.implicit "implicit" member was added in SortGroupClause. I didn't find clear reason to add this. Could you show a case to clarify this? That's it for now. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Andrew Gierth on 15 Nov 2009 15:04 >>>>> "Hitoshi" == Hitoshi Harada <umi.tanuki(a)gmail.com> writes: Hitoshi> Questions here: Hitoshi> - agglevelsup? Hitoshi> We have aggregate capability that all arguments from upper Hitoshi> level query in downer level aggregate makes aggregate call Hitoshi> itself to upper level call, as a constant value in downer Hitoshi> level. What if ORDER BY clause has downer level Vars? For determining what query level the aggregate belongs to, the expressions in the ORDER BY clause are counted along with the actual argument expressions. Hitoshi> Is it sane? The result is consistent but surprised me a Hitoshi> little. No need to raise an error? What case exactly would you consider an error? When an order by expression references a lower (more deeply nested) query level than any of the actual arguments? Hitoshi> - order by 1? Hitoshi> Normal ORDER BY clause accepts constant integer as Hitoshi> TargetEntry's resno. The patch seems not to support it. Hitoshi> Shouldn't it be the same as normal ORDER BY? Specifically documented. The SQL spec doesn't allow ordinal positions in ORDER BY any more (those are a holdover from SQL92) and we don't support them in, for example, window ORDER BY clauses. Hitoshi> Performance doesn't seem slowing down, though I don't have Hitoshi> quantitative test result. The performance is intended to be no worse than DISTINCT already was, though it's also no better. Hitoshi> Coding, almost all sane. Since its syntax and semantics are Hitoshi> similar to existing DISTINCT and ORDER BY features, parsing Hitoshi> and transformation code are derived from those area. The Hitoshi> executor has few issues: Hitoshi> - #include in nodeAgg.c Hitoshi> executor/tuptable.h is added in the patch but required really? Hitoshi> I removed that line but still build without any warnings. The code is making explicit use of various Slot calls declared in tuptable.h. The only reason why it builds without error when you remove that is that utils/tuplesort.h happens to include tuptable.h indirectly. Hitoshi> - process_ordered_aggregate_(single|multi) Hitoshi> It seems that the patch left process_sorted_aggregate() Hitoshi> function as process_ordered_aggregate_single() and added Hitoshi> process_ordered_aggregate_multi() for more than one input Hitoshi> arguments (actually target entries) case. Why have those Hitoshi> two? Could we combine them? Or I couldn't find convincing Hitoshi> reason in comments. Performance. tuplesort_getdatum etc. seems to be substantially faster than tuplesort_gettupleslot especially for the case where you're sorting a pass-by-value datum such as an integer (since the datum is then stored only in the sort tuple header and doesn't require a separate space allocation for itself). Using a slot in all cases would have slowed down some common cases like count(distinct id) by a measurable amount. Cases like array_agg(x order by x) benefit from the faster code path too. The memory management between the two cases is sufficiently different that combining them into one function while still maintaining the slot vs. datum distinction would be ugly and probably error-prone. The relatively minor duplication of logic seemed much clearer to me. Hitoshi> And ParseFuncOrColumn() in parse_func.c now gets more Hitoshi> complicated. I thought very hard about breaking some of that out into a separate function, but it wasn't initially clear which parts might have needed access to the original raw parsetree. I'm open to opinions on this. Hitoshi> Since feature / semantics are similar, I bet we may share Hitoshi> some code to transform DISTINCT and ORDER BY with Hitoshi> traditional code in parse_clause.c, though I'm not sure nor Hitoshi> don't have clear idea. Especially, code around here Hitoshi> save_next_resno = pstate->p_next_resno; Hitoshi> pstate->p_next_resno = attno + 1; Hitoshi> cheats pstate to transform clauses and I felt a bit fear. The code that transforms RETURNING clauses does something similar with p_next_resno. Almost all the work of transforming the ORDER BY clause is actually done via the existing transformSortClause (which is the reason why p_next_resno needs to be saved and restored), the additional logic is only for the DISTINCT case, to validate the correspondance between DISTINCT args and ORDER BY args and to generate implicit ordering clauses (which provide comparison function info to the executor) when needed. Hitoshi> - SortGroupClause.implicit Hitoshi> "implicit" member was added in SortGroupClause. I didn't Hitoshi> find clear reason to add this. Could you show a case to Hitoshi> clarify this? Without that flag or something like it, when you do create view foo as select count(distinct x) from table; and then display the view definition, you would get back the query as "select count(distinct x order by x) from table" which would be confusing and unnecessarily backwards- and forwards-incompatible. So the code sets "implicit" for any SortGroupClause that is added for some internal reason rather than being present in the original query, and the deparse code in ruleutils skips such clauses. -- Andrew. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Andrew Gierth on 15 Nov 2009 18:23 >>>>> "Andrew" == Andrew Gierth <andrew(a)tao11.riddles.org.uk> writes: Andrew> Performance. Andrew> tuplesort_getdatum etc. seems to be substantially faster than Andrew> tuplesort_gettupleslot especially for the case where you're Andrew> sorting a pass-by-value datum such as an integer (since the Andrew> datum is then stored only in the sort tuple header and Andrew> doesn't require a separate space allocation for Andrew> itself). Using a slot in all cases would have slowed down Andrew> some common cases like count(distinct id) by a measurable Andrew> amount. Andrew> Cases like array_agg(x order by x) benefit from the faster Andrew> code path too. Andrew> The memory management between the two cases is sufficiently Andrew> different that combining them into one function while still Andrew> maintaining the slot vs. datum distinction would be ugly and Andrew> probably error-prone. The relatively minor duplication of Andrew> logic seemed much clearer to me. Just to quantify this, using a production-quality build (optimized and without assertions), it turns out that the fast code path (process_ordered_aggregate_single) is faster by 300% for pass-by-value types, and by approximately 20% for short values of pass-by-reference types, as compared to disabling that code path and forcing even the one-arg case to use the slot interface. So using the slot interface for everything would have constituted a 300% slowdown over the older code for count(distinct id), obviously undesirable. As it stands, I can't detect any performance regression over the previous code. This means that agg(x order by y) is rather noticably slower than agg(x order by x), but this is pretty much unavoidable given how the sorting code works. Future performance enhancements (which I have no particular plans to tackle) would involve having the planner consult the desired aggregate orderings and estimating the cost of sorting as opposed to the cost of producing a plan with the input already ordered. Also combining the sort step for aggregates that share a single ordering. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Stark on 15 Nov 2009 19:17
On Sun, Nov 15, 2009 at 11:23 PM, Andrew Gierth <andrew(a)tao11.riddles.org.uk> wrote: > Future performance enhancements (which I have no particular plans to > tackle) would involve having the planner consult the desired aggregate > orderings and estimating the cost of sorting as opposed to the cost of > producing a plan with the input already ordered. Also combining the > sort step for aggregates that share a single ordering. Those both seem like pretty important things. Do you have an idea how to go about doing them? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |