From: Robert Haas on
On Thu, Mar 25, 2010 at 5:17 PM, David Fetter <david(a)fetter.org> wrote:
> On Wed, Mar 24, 2010 at 06:31:59PM +0100, A. Kretschmer wrote:
>> Hello @all,
>>
>> I know, i can do:
>>
>> select * from (select ... row_number() over (...) ...) foo where
>> row_number < N
>>
>> to limit the rows per group, but the inner select has to retrieve
>> the whole set of records and in the outer select most of them
>> discarded.
>
> That sounds like the optimizer's falling down on the job.  Would this
> be difficult to fix?

I may not be the best person to offer an opinion on this topic, but it
sounds tricky to me. I think it would need some kind of extremely
specific special-case logic. The planner would have to recognize
row_number() < n, row_number() <= n, and row_number = n as special
cases indicating that n-1, n, and n records respectively should be
expected to be fetched from the partition. And you might also worry
about n > row_number(), and n >= row_number().

It might be worth doing because I suspect that is actually going to be
a fairly common type of query, but some thought needs to be given to
how to do it without resorting to abject kludgery.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Hitoshi Harada on
2010/3/26 David Fetter <david(a)fetter.org>:
> On Wed, Mar 24, 2010 at 06:31:59PM +0100, A. Kretschmer wrote:
>> Hello @all,
>>
>> I know, i can do:
>>
>> select * from (select ... row_number() over (...) ...) foo where
>> row_number < N
>>
>> to limit the rows per group, but the inner select has to retrieve
>> the whole set of records and in the outer select most of them
>> discarded.
>
> That sounds like the optimizer's falling down on the job.  Would this
> be difficult to fix?

I believe this isn't the task of window functions. In fact, "over( ...
LIMIT n)" or optimizer hack will fail on multiple window definitions.

To take top N items of each group (I agree this is quite common job),
I'd suggest syntax that is done by extending DISTINCT ON.

SELECT DISTINCT n ON(key1, key2) ...

where "n" means top "n" items on each "key1, key2" group. The current
DISTINCT ON() syntax is equivalent to DISTINCT 1 ON() in this way.
That'll be fairly easy to implement and you aren't be bothered by this
like multiple window definitions. The cons of this is that it can be
applied to only row_number logic. You may want to use rank,
dense_rank, etc. sometimes.

Regards,

--
Hitoshi Harada

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers