Fetching MAX rows with one index scan. [DB2]

Prev: SELECT INTO with ORDER BY and FETCH FIRST ONLY
Next: how to view a vertical table in horizontal way

From: Michel Esber on 18 Jan 2010 14:09

1) select sample, field1 from (
select sample, field1, row_number() over (order by sample desc)
as rn
from data ) where rn = 1;

Hi Lennart, this won´t work as each sample field ( a timestamp) can
have multiple rows. If there is a way to assign the same RN number to
EACH of the unique timestamps, that would be sweet.

> 2)
> select * from data
> order by sample desc
> fetch first 1 rows only;

That will also not work. Each timestamp may have multiple rows ...

Thanks, -M

From: Michel Esber on 18 Jan 2010 14:19

On 18 jan, 17:01, jefftyzzer <jefftyz...(a)sbcglobal.net> wrote:
> On Jan 18, 9:59 am, Michel Esber <smes...(a)gmail.com> wrote:
>
>
>
>
>
> > Hello,
>
> > Db2 UDB v9.5 Fp3.
>
> > I need to return the set of rows that have the highest timestamp. This
> > is very easy:
>
> > drop table DATA;
>
> > create table DATA (sample timestamp, field1 integer);
>
> > insert into DATA values ('2010-01-18-14.00.00.000000', 1),
> > ('2010-01-18-14.00.00.000000',2), ('2010-01-18-14.00.00.000000',3),
> > ('2010-01-18-14.01.00.000000',4), ('2010-01-18-14.01.00.000000',5),
> > ('2010-01-18-14.02.00.000000',6), ('2010-01-18-14.02.00.000000',7);
>
> > Easy query:
>
> > select * from DATA D,
> > (select MAX(SAMPLE) MAX_SAMPLE from DATA) as M
> > where D.SAMPLE = M.MAX_SAMPLE;
>
> > My source table has hundreds of Millions of rows and the access plan
> > shows two index scans.
> > Is there any magic trick to achieve the same result with only one
> > index scan?
>
> > Thanks in advance, -Michel
>
> I'm assuming a FETCH FIRST 1 ROW-type query won't work because you
> said you need to return the SET of rows. I've not tested this query,
> but perhaps it might give you what you need:
>
> SELECT
> A,
> B,
> C,
> SAMPLE,
> MAX(SAMPLE) OVER() M_SAMPLE
> FROM
> DATA
> WHERE
> SAMPLE = M_SAMPLE;

Thanks Jeff. That syntax is incorrect, but I got your idea.. MAX over
() is nice, but with the result of this query I need to scan the DATA
table again, filtering only samples that match M_SAMPLE.

Any thoughts?

-M

From: jefftyzzer on 18 Jan 2010 14:24

On Jan 18, 11:19 am, Michel Esber <smes...(a)gmail.com> wrote:
> On 18 jan, 17:01, jefftyzzer <jefftyz...(a)sbcglobal.net> wrote:
>
>
>
> > On Jan 18, 9:59 am, Michel Esber <smes...(a)gmail.com> wrote:
>
> > > Hello,
>
> > > Db2 UDB v9.5 Fp3.
>
> > > I need to return the set of rows that have the highest timestamp. This
> > > is very easy:
>
> > > drop table DATA;
>
> > > create table DATA (sample timestamp, field1 integer);
>
> > > insert into DATA values ('2010-01-18-14.00.00.000000', 1),
> > > ('2010-01-18-14.00.00.000000',2), ('2010-01-18-14.00.00.000000',3),
> > > ('2010-01-18-14.01.00.000000',4), ('2010-01-18-14.01.00.000000',5),
> > > ('2010-01-18-14.02.00.000000',6), ('2010-01-18-14.02.00.000000',7);
>
> > > Easy query:
>
> > > select * from DATA D,
> > > (select MAX(SAMPLE) MAX_SAMPLE from DATA) as M
> > > where D.SAMPLE = M.MAX_SAMPLE;
>
> > > My source table has hundreds of Millions of rows and the access plan
> > > shows two index scans.
> > > Is there any magic trick to achieve the same result with only one
> > > index scan?
>
> > > Thanks in advance, -Michel
>
> > I'm assuming a FETCH FIRST 1 ROW-type query won't work because you
> > said you need to return the SET of rows. I've not tested this query,
> > but perhaps it might give you what you need:
>
> > SELECT
> > A,
> > B,
> > C,
> > SAMPLE,
> > MAX(SAMPLE) OVER() M_SAMPLE
> > FROM
> > DATA
> > WHERE
> > SAMPLE = M_SAMPLE;
>
> Thanks Jeff. That syntax is incorrect, but I got your idea.. MAX over
> () is nice, but with the result of this query I need to scan the DATA
> table again, filtering only samples that match M_SAMPLE.
>
> Any thoughts?
>
> -M

Yes--I sort of jumped the gun on my reply (a bad habit of mine). Of
course you can't refer to the M_SAMPLE column unless/until you do
something like:

SELECT
*
FROM
(
SELECT
A,
B,
C,
SAMPLE,
MAX(SAMPLE) OVER() M_SAMPLE
FROM
DATA
) T
WHERE
SAMPLE = M_SAMPLE;

Any chance that will only require a single pass?

--Jeff

From: Tonkuma on 18 Jan 2010 15:13

Try RANK or DENSE_RANK, like this:

------------------------------ Commands Entered
------------------------------
WITH
DATA AS (
SELECT timestamp(sample) AS sample
, field1
FROM (values
('2010-01-18-14.00.00.000000',1)
, ('2010-01-18-14.00.00.000000',2)
, ('2010-01-18-14.00.00.000000',3)
, ('2010-01-18-14.01.00.000000',4)
, ('2010-01-18-14.01.00.000000',5)
, ('2010-01-18-14.02.00.000000',6)
, ('2010-01-18-14.02.00.000000',7)
) s(sample , field1)
)
SELECT sample , field1
FROM (SELECT d.*
, RANK() OVER(ORDER BY sample DESC) rnk
FROM data d
) s
WHERE rnk = 1;
------------------------------------------------------------------------------

SAMPLE FIELD1
-------------------------- -----------
2010-01-18-14.02.00.000000 6
2010-01-18-14.02.00.000000 7

2 record(s) selected.

From: Lennart on 18 Jan 2010 15:53

On 18 Jan, 20:09, Michel Esber <smes...(a)gmail.com> wrote:
> 1) select sample, field1 from (
> select sample, field1, row_number() over (order by sample desc)
> as rn
> from data ) where rn = 1;
>
> Hi Lennart, this won´t work as each sample field ( a timestamp) can
> have multiple rows. If there is a way to assign the same RN number to
> EACH of the unique timestamps, that would be sweet.
>

Use rank() instead of row_number()

/Lennart

First | Prev | Next | Last
Pages: 1 2 3
Prev: SELECT INTO with ORDER BY and FETCH FIRST ONLY
Next: how to view a vertical table in horizontal way