A thought on Index Organized Tables [PgSql]

Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm

From: Gokulakannan Somasundaram on 25 Feb 2010 15:09

> 1) transaction information in index
>
> This seems like a lot of bloat in indexes. It also means breaking
> a lot of other optimizations such as being able to read the tuples
> directly from the heap page without locking. I'm not sure how much
> those are worth though. But adding 24 bytes to every index entry seems
> pretty unlikely to be a win anyways.
>
>
Greg,
I think, somewhere things have been misunderstood. we only need 8
bytes more per index entry. I thought Postgres has a 8 byte transaction id,
but it is only 4 bytes, so we only need to save the insertion and deletion
xids. So 8 bytes more per tuple.

Gokul.

From: Tom Lane on 25 Feb 2010 16:02

Gokulakannan Somasundaram <gokul007(a)gmail.com> writes:
> I think, somewhere things have been misunderstood. we only need 8
> bytes more per index entry. I thought Postgres has a 8 byte transaction id,
> but it is only 4 bytes, so we only need to save the insertion and deletion
> xids. So 8 bytes more per tuple.

What makes you think you can get away without cmin/cmax?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on 25 Feb 2010 16:08

On Thu, Feb 25, 2010 at 8:09 PM, Gokulakannan Somasundaram
<gokul007(a)gmail.com> wrote:
> I think, somewhere things have been misunderstood. we only need 8
> bytes more per index entry. I thought Postgres has a 8 byte transaction id,
> but it is only 4 bytes, so we only need to save the insertion and deletion
> xids. So 8 bytes more per tuple.
>

Well in the heap we need

4 bytes: xmin
4 bytes: xmax
4 bytes: cid
6 bytes: ctid
6 bytes: various info bits including natts

In indexes we currently get away with a reduced header which has few
of the 6 bytes of info bits. However the only reason we can do is
because we impose arbitrary limitations that work for indexes but
wouldn't be reasonable for tables. Such as a lower maximum number of
columns, inability to add new columns or drop columns later, etc.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 25 Feb 2010 16:25

Greg Stark <gsstark(a)mit.edu> writes:
> In indexes we currently get away with a reduced header which has few
> of the 6 bytes of info bits. However the only reason we can do is
> because we impose arbitrary limitations that work for indexes but
> wouldn't be reasonable for tables. Such as a lower maximum number of
> columns, inability to add new columns or drop columns later, etc.

Wait a second, which idea are we currently talking about? No heap at
all, or just the ability to check visibility without visiting the heap?

If it's a genuine IOT (ie no separate heap), then you are not going to
be able to get away without a full heap tuple header. We've sweated
blood to get that struct down to where it is; there's no way to make it
smaller without giving up some really fundamental things, for example
the ability to do UPDATE :-(

If you just want to avoid a heap visit for visibility checks, I think
you'd only need to add xmin/xmax/cmin plus the hint bits for same.
This is going to end up costing 16 bytes in practice --- you might
think you could squeeze into 12 but on 64-bit machines (MAXALIGN 8)
you'll save nothing. So that's effectively a doubling of index size
for common cases such as a single int4 or int8 index column. The other
problem is the extra write load created by needing to update the index's
copies of the hint bits; not to mention extra writes to freeze the xids
when they get old enough.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Gokulakannan Somasundaram on 25 Feb 2010 17:12

> Wait a second, which idea are we currently talking about? No heap at
> all, or just the ability to check visibility without visiting the heap?
>

I was talking about the indexes with snapshot

>
> If it's a genuine IOT (ie no separate heap), then you are not going to
> be able to get away without a full heap tuple header. We've sweated
> blood to get that struct down to where it is; there's no way to make it
> smaller without giving up some really fundamental things, for example
> the ability to do UPDATE :-(
>

Of course, as i said, the leaf pages will have HeapTuples in IOT. As a
Postgres user, definitely i am thankful for what has been done.

> If you just want to avoid a heap visit for visibility checks, I think
> you'd only need to add xmin/xmax/cmin plus the hint bits for same.
> This is going to end up costing 16 bytes in practice --- you might
> think you could squeeze into 12 but on 64-bit machines (MAXALIGN 8)
> you'll save nothing. So that's effectively a doubling of index size
> for common cases such as a single int4 or int8 index column.

Yes but currently we are storing the size of index in IndexTuple, which is
also stored in ItemId. If we can somehow make it use that info, then we have
13 bits of flag for free and we can reduce it to 8 bytes of extra info. But
we need you to sweat some more blood for that :). But again, unless we
resolve the volatile functions issue, there is no use in worrying about
this.

> The other
> problem is the extra write load created by needing to update the index's
> copies of the hint bits; not to mention extra writes to freeze the xids
> when they get old enough.
>
But Tom, i remember that the vacuum was faster when index had visibility
info, since we need not touch the table. But maybe i am wrong. Atleast i
remember that was the case, when the
relation had only thick indexes.
Oh..Yeah... visibility map might have changed the equation.

Thanks,
Gokul

First | Prev | Next | Last
Pages: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm