Fetching MAX rows with one index scan. [DB2]

Prev: SELECT INTO with ORDER BY and FETCH FIRST ONLY
Next: how to view a vertical table in horizontal way

From: Michel Esber on 18 Jan 2010 12:59

Hello,

Db2 UDB v9.5 Fp3.

I need to return the set of rows that have the highest timestamp. This
is very easy:

drop table DATA;

create table DATA (sample timestamp, field1 integer);

insert into DATA values ('2010-01-18-14.00.00.000000', 1),
('2010-01-18-14.00.00.000000',2), ('2010-01-18-14.00.00.000000',3),
('2010-01-18-14.01.00.000000',4), ('2010-01-18-14.01.00.000000',5),
('2010-01-18-14.02.00.000000',6), ('2010-01-18-14.02.00.000000',7);

Easy query:

select * from DATA D,
(select MAX(SAMPLE) MAX_SAMPLE from DATA) as M
where D.SAMPLE = M.MAX_SAMPLE;

My source table has hundreds of Millions of rows and the access plan
shows two index scans.
Is there any magic trick to achieve the same result with only one
index scan?

Thanks in advance, -Michel

From: jefftyzzer on 18 Jan 2010 13:45

On Jan 18, 9:59 am, Michel Esber <smes...(a)gmail.com> wrote:
> Hello,
>
> Db2 UDB v9.5 Fp3.
>
> I need to return the set of rows that have the highest timestamp. This
> is very easy:
>
> drop table DATA;
>
> create table DATA (sample timestamp, field1 integer);
>
> insert into DATA values ('2010-01-18-14.00.00.000000', 1),
> ('2010-01-18-14.00.00.000000',2), ('2010-01-18-14.00.00.000000',3),
> ('2010-01-18-14.01.00.000000',4), ('2010-01-18-14.01.00.000000',5),
> ('2010-01-18-14.02.00.000000',6), ('2010-01-18-14.02.00.000000',7);
>
> Easy query:
>
> select * from DATA D,
> (select MAX(SAMPLE) MAX_SAMPLE from DATA) as M
> where D.SAMPLE = M.MAX_SAMPLE;
>
> My source table has hundreds of Millions of rows and the access plan
> shows two index scans.
> Is there any magic trick to achieve the same result with only one
> index scan?
>
> Thanks in advance, -Michel

SELECT * FROM DATA ORDER BY SAMPLE DESC FETCH FIRST 1 ROW ONLY;

Ideally your index is on (or leads with) the SAMPLE column and is
DESCending or is set to ALLOW REVERSE SCANS

--Jeff

From: Lennart on 18 Jan 2010 13:54

On 18 Jan, 18:59, Michel Esber <smes...(a)gmail.com> wrote:
> Hello,
>
> Db2 UDB v9.5 Fp3.
>
> I need to return the set of rows that have the highest timestamp. This
> is very easy:
>
> drop table DATA;
>
> create table DATA (sample timestamp, field1 integer);
>
> insert into DATA values ('2010-01-18-14.00.00.000000', 1),
> ('2010-01-18-14.00.00.000000',2), ('2010-01-18-14.00.00.000000',3),
> ('2010-01-18-14.01.00.000000',4), ('2010-01-18-14.01.00.000000',5),
> ('2010-01-18-14.02.00.000000',6), ('2010-01-18-14.02.00.000000',7);
>
> Easy query:
>
> select * from DATA D,
> (select MAX(SAMPLE) MAX_SAMPLE from DATA) as M
> where D.SAMPLE = M.MAX_SAMPLE;
>
> My source table has hundreds of Millions of rows and the access plan
> shows two index scans.
> Is there any magic trick to achieve the same result with only one
> index scan?
>
> Thanks in advance, -Michel

Two suggestions:

1)
select sample, field1 from (
select sample, field1, row_number() over (order by sample desc) as
rn
from data
) where rn = 1;

2)
select * from data
order by sample desc
fetch first 1 rows only;

From: jefftyzzer on 18 Jan 2010 13:55

On Jan 18, 9:59 am, Michel Esber <smes...(a)gmail.com> wrote:
> Hello,
>
> Db2 UDB v9.5 Fp3.
>
> I need to return the set of rows that have the highest timestamp. This
> is very easy:
>
> drop table DATA;
>
> create table DATA (sample timestamp, field1 integer);
>
> insert into DATA values ('2010-01-18-14.00.00.000000', 1),
> ('2010-01-18-14.00.00.000000',2), ('2010-01-18-14.00.00.000000',3),
> ('2010-01-18-14.01.00.000000',4), ('2010-01-18-14.01.00.000000',5),
> ('2010-01-18-14.02.00.000000',6), ('2010-01-18-14.02.00.000000',7);
>
> Easy query:
>
> select * from DATA D,
> (select MAX(SAMPLE) MAX_SAMPLE from DATA) as M
> where D.SAMPLE = M.MAX_SAMPLE;
>
> My source table has hundreds of Millions of rows and the access plan
> shows two index scans.
> Is there any magic trick to achieve the same result with only one
> index scan?
>
> Thanks in advance, -Michel

This is untested, but if you at least know that your result set will
never be more than n rows (say 100), perhaps you could do this:

SELECT * FROM DATA ORDER BY SAMPLE DESC FETCH FIRST 100 ROWS ONLY;

Ideally your index is on (or leads with) the SAMPLE column and is
DESCending or is set to ALLOW REVERSE SCANS

--Jeff

From: jefftyzzer on 18 Jan 2010 14:01

On Jan 18, 9:59 am, Michel Esber <smes...(a)gmail.com> wrote:
> Hello,
>
> Db2 UDB v9.5 Fp3.
>
> I need to return the set of rows that have the highest timestamp. This
> is very easy:
>
> drop table DATA;
>
> create table DATA (sample timestamp, field1 integer);
>
> insert into DATA values ('2010-01-18-14.00.00.000000', 1),
> ('2010-01-18-14.00.00.000000',2), ('2010-01-18-14.00.00.000000',3),
> ('2010-01-18-14.01.00.000000',4), ('2010-01-18-14.01.00.000000',5),
> ('2010-01-18-14.02.00.000000',6), ('2010-01-18-14.02.00.000000',7);
>
> Easy query:
>
> select * from DATA D,
> (select MAX(SAMPLE) MAX_SAMPLE from DATA) as M
> where D.SAMPLE = M.MAX_SAMPLE;
>
> My source table has hundreds of Millions of rows and the access plan
> shows two index scans.
> Is there any magic trick to achieve the same result with only one
> index scan?
>
> Thanks in advance, -Michel

I'm assuming a FETCH FIRST 1 ROW-type query won't work because you
said you need to return the SET of rows. I've not tested this query,
but perhaps it might give you what you need:

SELECT
A,
B,
C,
SAMPLE,
MAX(SAMPLE) OVER() M_SAMPLE
FROM
DATA
WHERE
SAMPLE = M_SAMPLE;

--Jeff

| Next | Last
Pages: 1 2 3
Prev: SELECT INTO with ORDER BY and FETCH FIRST ONLY
Next: how to view a vertical table in horizontal way