A thought on Index Organized Tables [PgSql]

Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm

From: Karl Schnaitter on 25 Feb 2010 18:30

> The other
>> problem is the extra write load created by needing to update the index's
>> copies of the hint bits; not to mention extra writes to freeze the xids
>> when they get old enough.
>>
> But Tom, i remember that the vacuum was faster when index had visibility
> info, since we need not touch the table. But maybe i am wrong.
>

I disagree with that, Gokul -- if the ordering operators are volatile or
just incorrect, during DELETE, you could set xmax in the wrong IndexTuple.
Then there will be another IndexTuple that says it's visible, but it points
to a non-visible heap tuple. I think you should follow the pointers to the
heap before you decide to let an index tuple remain in the index during
vacuum. This would ensure that all references from an index to a heap tuple
are removed before vacuuming the heap tuple. I would be worried about what
might break if this invariant doesn't hold.

Tom is right about all the extra overhead involved with keeping visibility
info in the index. But it can be a good trade-off in some cases.

Karl

From: Greg Stark on 25 Feb 2010 18:45

On Thu, Feb 25, 2010 at 9:25 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> We've sweated
> blood to get that struct down to where it is; there's no way to make it
> smaller without giving up some really fundamental things, for example
> the ability to do UPDATE :-(

Oh, this is a tangent but I think there are some more gains there, at
least now that we've eliminated vacuum full. The more we save the more
complex the code and data structure becomes so there may be a point
where it's not worthwhile any more. And of course if we do try to do
any of these then it wouldn't be part of IOT it would be a general
improvement which would help tables as well.

For future reference, here are some items that have come up in the past:

1) We've talked about having a "common xmin" in the page header and
then a bit indicating that the xmin is missing from the tuple header
because it matches the value in the page header. This would save a lot
of space in the common case where data was all loaded in a single
transaction and all the tuples have the same xmin.

2) Now that we don't have vacuum full the command-id is kind of a
waste. We could replace it with some kind of local memory data
structure which is capable of spilling to disk. When the transaction
commits it can be thrown away and no other session needs to be able to
see it. This could have an impact on future work on parallel query but
I think our phantom-command-id already has such issues anyways.

3) xmax and ctid are unavoidable since we never know when a tuple
might be deleted or updated in the future. But if we allowed the user
to mark a table "insert-only" then it could be left out and any
operation which tries to delete, update, or select for update a row in
the table would throw an error.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 25 Feb 2010 18:53

Greg Stark <gsstark(a)mit.edu> writes:
> 2) Now that we don't have vacuum full the command-id is kind of a
> waste.

Not really.

> We could replace it with some kind of local memory data
> structure which is capable of spilling to disk.

The performance costs of that would probably outweigh any space savings.

> I think our phantom-command-id already has such issues anyways.

It can, but it's relatively uncommon to update a large number of tuples
more than once in a transaction. What you're suggesting would move that
bottleneck into mainstream cases. And it would be a bigger bottleneck
since you would have no lookup key available within the tuple header.
You'd have to use ctid as the lookup key which means no ability to use
one table entry for multiple rows, not to mention what do you do before
the tuple has a ctid assigned?

> 3) xmax and ctid are unavoidable since we never know when a tuple
> might be deleted or updated in the future. But if we allowed the user
> to mark a table "insert-only" then it could be left out and any
> operation which tries to delete, update, or select for update a row in
> the table would throw an error.

Anything with "this field is optional" is going to be a complete
disaster for mapping C structs over tuple headers...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Karl Schnaitter on 25 Feb 2010 18:57

If it's of any interest, I can say something about the hint bits in the
index tuple header. In my implementation, my decision was to use only one
hint bit. It went into the unused 13th bit of the IndexTuple header. When
the hint bit is set, it means that

(xmin is committed OR xmin = InvalidTransactionId)
AND (xmax is committed OR xmax = InvalidTransactionId)

Then there are 12 bytes for xmin/xmax/cid. I did sweat something over this
decision... but maybe it was a wasted effort if the 12 bytes end up
occupying 16 bytes anyway.

Karl

On Thu, Feb 25, 2010 at 1:25 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:

> Greg Stark <gsstark(a)mit.edu> writes:
> > In indexes we currently get away with a reduced header which has few
> > of the 6 bytes of info bits. However the only reason we can do is
> > because we impose arbitrary limitations that work for indexes but
> > wouldn't be reasonable for tables. Such as a lower maximum number of
> > columns, inability to add new columns or drop columns later, etc.
>
> Wait a second, which idea are we currently talking about? No heap at
> all, or just the ability to check visibility without visiting the heap?
>
> If it's a genuine IOT (ie no separate heap), then you are not going to
> be able to get away without a full heap tuple header. We've sweated
> blood to get that struct down to where it is; there's no way to make it
> smaller without giving up some really fundamental things, for example
> the ability to do UPDATE :-(
>
> If you just want to avoid a heap visit for visibility checks, I think
> you'd only need to add xmin/xmax/cmin plus the hint bits for same.
> This is going to end up costing 16 bytes in practice --- you might
> think you could squeeze into 12 but on 64-bit machines (MAXALIGN 8)
> you'll save nothing. So that's effectively a doubling of index size
> for common cases such as a single int4 or int8 index column. The other
> problem is the extra write load created by needing to update the index's
> copies of the hint bits; not to mention extra writes to freeze the xids
> when they get old enough.
>
> regards, tom lane
>

From: Gokulakannan Somasundaram on 25 Feb 2010 18:59

>
> I disagree with that, Gokul -- if the ordering operators are volatile or
> just incorrect, during DELETE, you could set xmax in the wrong IndexTuple.
> Then there will be another IndexTuple that says it's visible, but it points
> to a non-visible heap tuple. I think you should follow the pointers to the
> heap before you decide to let an index tuple remain in the index during
> vacuum. This would ensure that all references from an index to a heap tuple
> are removed before vacuuming the heap tuple. I would be worried about what
> might break if this invariant doesn't hold.
>

Well, Karl, if we have to support function based indexes/IOT, one thing is
for sure. We can't support them for volatile functions / broken data types.
Everyone agrees with that. But the question is how we identify something is
not a volatile function. Only way currently is to let the user make the
decision( Or we should consult some mathematician ). So we need not consult
the heaptuple.

Gokul.

First | Prev | Next | Last
Pages: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm