A thought on Index Organized Tables [PgSql]

Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm

From: Robert Haas on 24 Feb 2010 11:18

On Wed, Feb 24, 2010 at 11:05 AM, Gokulakannan Somasundaram
<gokul007(a)gmail.com> wrote:
>> I think you're a barking up the wrong tree. AFAIUI, the need for the
>> visibility map has not very much to do with whether the table has
>> indices, and everything to do with avoiding unnecessary VACUUMs. In
>> any event, you've not shown that the visibility map HAS any overhead,
>> so talking about skipping it seems entirely premature. Keep in mind
>> that the visibility map is quite small.
>
> OK! i am not saying to remove the visibility map, if i am misunderstood. All
> i am saying here is to remove the index only scan processing of visibility
> map. If it is being used only for vacuums, you need not make it crash safe
> and no WAL comes into picture.

So basically you want to have index-only scans, but you want them to
be really slow?

>> The point of the visibility map as far as index-only scans are
>> concerned is that if all the needed column values can be extracted
>> from the index, we still need to read the heap page to check tuple
>> visibility - unless, of course, we already know from the visibility
>> map that all the tuples on that heap page are guaranteed to be visible
>> to all transactions. On a read-only or read-mostly table, this will
>> reduce the cost of checking tuple visibility by several orders of
>> magnitude.
>>
> I understand that. As i suggested above, if you have no indexes for a table,
> why do you need to spend the extra effort in making it crash safe for that
> table? Hope i am clear.

Tables without indices don't need to be crash safe? News to me.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on 24 Feb 2010 11:28

Gokulakannan Somasundaram <gokul007(a)gmail.com> wrote:

>> With an IOT I don't understand how you get out of index
>> corruption without data loss. That's a showstopper for practical
>> use, I think.
>
> For simplicity, say we are storing all the non-leaf pages of the
> index in a seperate file, then the leaf pages are nothing but the
> table. So if we can replicate the table, then we can replicate the
> non-leaf pages (say by some modified version of reindex).

So you are essentially proposing that rather than moving the heap
data into the leaf tuples of the index in the index file, you will
move the leaf index data into the heap tuples? The pages in such a
IOT heap file would still need to look a lot like index pages, yes?

I'm not saying it's a bad idea, but I'm curious what benefits you
see to taking that approach.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 24 Feb 2010 11:39

"Kevin Grittner" <Kevin.Grittner(a)wicourts.gov> writes:
> So you are essentially proposing that rather than moving the heap
> data into the leaf tuples of the index in the index file, you will
> move the leaf index data into the heap tuples? The pages in such a
> IOT heap file would still need to look a lot like index pages, yes?

> I'm not saying it's a bad idea, but I'm curious what benefits you
> see to taking that approach.

Isn't that just a variant on Heikki's "grouped index tuples" idea?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 24 Feb 2010 11:48

Gokulakannan Somasundaram wrote:
>> If you have a scenario where the visibility map incurs a measurable
>> overhead, let's hear it. I didn't see any in the tests I performed, but
>> it's certainly possible that if the circumstances are just right it
>> makes a difference.
>>
>> Heikki,
> The obvious one, i could observe is that it would increase the WAL
> contention. Am i missing something?

Yes. The visibility map doesn't need any new WAL records to be written.

We probably will need to add some WAL logging to close the holes with
crash recovery, required for relying on it for index-only-scans, but
AFAICS only for VACUUM and probably only one WAL record for a whole
bunch of heap pages, so it should be pretty insignificant.

> All i am suggesting is to reduce the
> unnecessary work required in those tables, where the visibility map is not
> required. For example, in data warehouses, people might even have a tables
> without any indexes. Why do we ask them to incur the overhead of visibility
> map?

To make it possible to do partial VACUUMs. That's why the visibility map
was put into 8.4.

Let me repeat myself: if you think the overhead of a visibility map is
noticeable or meaningful in any scenario, the onus is on you to show
what that scenario is. I am not aware of such a scenario, which doesn't
mean that it doesn't exist, of course, but hand-waving is not helpful.

> Also since you have made the visibility maps without any page
> level locking, have you considered whether it would make sure the correct
> order of inserts into the WAL? i have looked at some random threads, but i
> couldn't get the complete design of visibility map to be used for index only
> scans.

I'm not sure what you mean with "without any page level locking".
Whenever a visibility map page is read or modified, a lock is taken on
the buffer.

I believe the current visibility map is free of race conditions, even if
it was used for index-only-scans, if that's what you mean. The critical
part is when a bit is cleared in the visibility map. It is done just
after inserting/deleting the heap tuple, which is OK because in the
window between modifying the heap page and clearing bit in the
visibility map, no other backend could see the actions of the modifying
transaction yet anyway. The index updates have not been made yet, so the
information in the indexes are still valid for the other transaction's
snapshot.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on 24 Feb 2010 12:04

Tom Lane <tgl(a)sss.pgh.pa.us> wrote:

> Isn't that just a variant on Heikki's "grouped index tuples" idea?

With apologies to Heikki for having forgotten that effort, yes.

With the "simplifying" technique of keeping the leaf level in a
separate file, it becomes hard to distinguish from Heikki's Grouped
Index Tuples approach when you include the "maintain cluster order"
patch. That really looks like where anyone should work from for any
IOT effort. It appears to have been largely completed years ago.

For those who missed or forgot it, this is the latest I could find:

http://community.enterprisedb.com/git/

Heikki, any thoughts on what it would take, beside cleaning up bit
rot?

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm