A thought on Index Organized Tables [PgSql]

Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm

From: Gokulakannan Somasundaram on 24 Feb 2010 14:13

> It might be slightly easier given the assumption that you only want to
> scan leaf tuples.
>
> But there's an additional problem I didn't think of before. Currently
> we optimize index scans by copying all relevant tuples to local memory
> so we don't need to hold an index lock for an extended time or spend a
> lot of time relocking and rechecking the index for changes. That
> wouldn't be possible if we needed to get visibility info from the page
> since we would need up-to-date information.
>
>
> We should solve this issue in the same way, of how we proceed with the
index only quals, in current index-only scans.

Gokul.

From: Gokulakannan Somasundaram on 24 Feb 2010 15:04

>
>> That doesn't work because when you split an index page any sequential
>>> scan in progress will either see the same tuples twice or will miss
>>> some tuples depending on where the new page is allocated. Vacuum has a
>>> clever trick for solving this but it doesn't work for arbitrarily many
>>> concurrent scans.
>>>
>>> Consider how the range scans are working today, while the page split
>> happens.
>>
>> The Seq scan should follow the right sibling to do the seq scan.
>>
>> Gokul.
>>
>>
> Actually thinking about what you suggested for a while, i think it should
> be possible, because the Oracle Fast Full Index scan essentially scans the
> index like that. I will try to think a way of doing that with Lehman and
> Yao...
>
> Gokul.
>

OK I think, i can think of a solution to achieve fast full index scan like
oracle.
a) Issue ids to every block that gets created inside the index. we are
already doing that
b) Now before the fast full index scan starts, note down the max id that got
issued.
c) Now do a scan similar to Full Table Scan till that max id. Now, while we
are scanning the blocks, note down the right siblings in a list, if the
right sibling block id is greater than the max id that got issued. These are
the ones, which have got split after the scan started
d) Now after we reach that max block, try to range scan on missing links
till we hit the end / get a right sibling less than the max block id noted.

Once we do this for all the missing links, we have scanned through the
entire chain. So this will definitely be faster than range scan.

Thanks to you, i have thought about an important issue and also a solution
for it.

Thanks,
Gokul.

From: Greg Stark on 24 Feb 2010 15:25

On Wed, Feb 24, 2010 at 8:04 PM, Gokulakannan Somasundaram
<gokul007(a)gmail.com> wrote:
> OK I think, i can think of a solution to achieve fast full index scan like
> oracle.
> a) Issue ids to every block that gets created inside the index. we are
> already doing that
> b) Now before the fast full index scan starts, note down the max id that got
> issued.
> c) Now do a scan similar to Full Table Scan till that max id. Now, while we
> are scanning the blocks, note down the right siblings in a list, if the
> right sibling block id is greater than the max id that got issued. These are
> the ones, which have got split after the scan started
> d) Now after we reach that max block, try to range scan on missing links
> till we hit the end / get a right sibling less than the max block id noted.
>
> Once we do this for all the missing links, we have scanned through the
> entire chain. So this will definitely be faster than range scan.
>
> Thanks to you, i have thought about an important issue and also a solution
> for it.

I haven't thought about whether this is sufficient but if it is then
an initial useful thing to add would be to use it for queries where we
have a qual that can be checked using the index key even though we
can't do a range scan. For example if you have a btree index on
<a,b,c> and you have a WHERE clause like "WHERE c=0"

That would be a much smaller change than IOT but it would still be a
pretty big project. Usually the hardest part is actually putting the
logic in the planner to determine whether it's advantageous. I would
suggest waiting until after 9.0 is out the door to make sure you have
the attention of Heikki or Tom or someone else who can spend the time
to check that it will actually work before putting lots of work coding
it.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 24 Feb 2010 15:32

Gokulakannan Somasundaram wrote:
> OK I think, i can think of a solution to achieve fast full index scan like
> oracle.
> a) Issue ids to every block that gets created inside the index. we are
> already doing that
> b) Now before the fast full index scan starts, note down the max id that got
> issued.
> c) Now do a scan similar to Full Table Scan till that max id. Now, while we
> are scanning the blocks, note down the right siblings in a list, if the
> right sibling block id is greater than the max id that got issued. These are
> the ones, which have got split after the scan started
> d) Now after we reach that max block, try to range scan on missing links
> till we hit the end / get a right sibling less than the max block id noted.
>
> Once we do this for all the missing links, we have scanned through the
> entire chain. So this will definitely be faster than range scan.

You also need to avoid scanning the same tuple twice. That's not a
problem for VACUUM, but it is for full index scans.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Gokulakannan Somasundaram on 24 Feb 2010 16:44

> You also need to avoid scanning the same tuple twice. That's not a
> problem for VACUUM, but it is for full index scans.
>
> Heikki,
Actually the logic, which i have explained earlier is to avoid
scanning tuples twice.

Gokul.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm