A thought on Index Organized Tables [PgSql]

Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm

From: Takahiro Itagaki on 22 Feb 2010 23:54

Gokulakannan Somasundaram <gokul007(a)gmail.com> wrote:

> a) IOT has both table and index in one structure. So no duplication of data
> b) With visibility maps, we have three structures a) Table b) Index c)
> Visibility map. So the disk footprint of the same data will be higher in
> postgres ( 2x + size of the visibility map).
> c) More than that, inserts and updates will incur two extra random i/os
> every time. - one for updating the table and one for updating the visibility
> map.

I think IOT is a good match for overwrite storage systems, but postgres
is a non-overwrite storage systems. If we will update rows in IOT, we need
much more initial page free spaces than index-only scans where we can avoid
key updates with HOT.

Instead, how about excluding columns in primary keys from table data?
We cannot drop those primary keys and cannot seqscan the tables, but
there are no duplication of data, only small overheads (index tuple
headers and ctid links), and would work well with HOT and index-only
scans. If we don't have any non-key columns, that behaves just as IOT.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 23 Feb 2010 00:12

Takahiro Itagaki <itagaki.takahiro(a)oss.ntt.co.jp> writes:
> Instead, how about excluding columns in primary keys from table data?

How will you implement "select * from mytable" ? Or even
"select * from mytable where non_primary_key = something" ?
If you can't do either of those without great expense, I think
a savings on primary-key lookups is not going to be adequate
recompense.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Takahiro Itagaki on 23 Feb 2010 01:09

Tom Lane <tgl(a)sss.pgh.pa.us> wrote:

> Takahiro Itagaki <itagaki.takahiro(a)oss.ntt.co.jp> writes:
> > Instead, how about excluding columns in primary keys from table data?
>
> How will you implement "select * from mytable" ? Or even
> "select * from mytable where non_primary_key = something" ?

Use index full scans. We can do it even for now with enable_seqscan = off.
Of course, it requires an additional step to merge index keys and heap tuples.

Also, we're using the same technique for TOASTed values. The cost can be
compared with "select * from mytable where toasted_value = something", no?

> If you can't do either of those without great expense, I think
> a savings on primary-key lookups is not going to be adequate
> recompense.

I don't think it will be the default, but it would be a reasonable trade-off
for users who want IOTs, unless they often scan the whole table.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 23 Feb 2010 01:23

Takahiro Itagaki <itagaki.takahiro(a)oss.ntt.co.jp> writes:
> Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> Takahiro Itagaki <itagaki.takahiro(a)oss.ntt.co.jp> writes:
>>> Instead, how about excluding columns in primary keys from table data?
>>
>> How will you implement "select * from mytable" ? Or even
>> "select * from mytable where non_primary_key = something" ?

> Also, we're using the same technique for TOASTed values. The cost can be
> compared with "select * from mytable where toasted_value = something", no?

No, because toast pointers point in the direction you need to traverse.
AFAICS, this proposal involves scanning the whole index to find the
matching entry, because the available pointers link from the wrong end,
that is from the index to the table.

There are also some fairly fatal problems associated with commands like
ALTER TABLE DROP PRIMARY KEY, but I see no need to worry about that
because you haven't even made a case that there's a net performance
gain possible here.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Takahiro Itagaki on 23 Feb 2010 01:44

Tom Lane <tgl(a)sss.pgh.pa.us> wrote:

> > Also, we're using the same technique for TOASTed values. The cost can be
> > compared with "select * from mytable where toasted_value = something", no?
>
> No, because toast pointers point in the direction you need to traverse.
> AFAICS, this proposal involves scanning the whole index to find the
> matching entry, because the available pointers link from the wrong end,
> that is from the index to the table.

Ah, I see there are differences if we have secondary indexes.
I misunderstood that the toast case requires scanning the whole *table* to
find the matching entry and should be compared with the whole *index* scans,
but there is another story if we have secondary indexes.

We can have indexes on toasted values, and find the related tuples
directly with CTIDs, but scans on secondary indexes on PK-excluded
tables requires not only heap tuples but also primary key values.

The secondary index issue should be considered also with the original
IOT proposal also has the same issue. Using PK-values instead of CTIDs
might require many changes in index access methods and/or the executor.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm