From: Pavel Stehule on 5 Jul 2010 01:51 Hello I am planning to start to implement median function. I wrote some array based implementation - it is fast, but I hope, so can be much faster. The basic question is method of implementation. It can be implemented via a) custum aggregate functions or b) executor node. Adventage of @a variant is simplicity - we need to teach aggregates to ensure ordered input and ensure to use index only (maybe add flag ORDERED INPUT [DESC|ASC]). Now PostgreSQL doesn't use a index for scan to orderd aggregate - it can be a problem for large datasets. I found some missing info in EXPLAIN about ordered aggregates - there are showed nothing about sort pavel(a)postgres:5432=# explain analyze verbose select array_agg(a order by a) from omega; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1643.00..1643.01 rows=1 width=4) (actual time=555.091..555.092 rows=1 loops=1) Output: array_agg(a ORDER BY a) -> Seq Scan on public.omega (cost=0.00..1393.00 rows=100000 width=4) (actual time=0.050..177.547 rows=100000 loops=1) Output: a Total runtime: 555.839 ms (5 rows) Probably we have to access a tuple store inside sfunc - when the data size is out of work memory. And for effective evaluating it needs patch https://commitfest.postgresql.org/action/patch_view?id=292 variant @b is more complex - but allows more possibilities - idea: median is one from aggregate executor nodes (theoretically it can call some custom final function in future - but I don't think about it now). It has a few advantages: a) we don't need to modify current aggregates b) for datasets smaller than working_mem can be used a quickselect algorithms - www-stat.stanford.edu/~ryantibs/papers/median.pdf c) for larger datasets we can use integrated external sort with direct reading - we don't need to stack result to array I prefer a variant b. It offers a more possibilities - and there are less chance to break some existing. comments are welcome Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
|
Pages: 1 Prev: [HACKERS] Always truncate segments before unlink Next: Better life |