A thought on Index Organized Tables [PgSql]

Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm

From: Tom Lane on 26 Feb 2010 14:05

Gokulakannan Somasundaram <gokul007(a)gmail.com> writes:
> But Tom, can you please explain me why that broken ordering example doesn't
> affect the current index scans.

It does. The point is that the system is set up to limit the bad
consequences. You might (will) get wrong query answers, but the
heap data won't get corrupted.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Gokulakannan Somasundaram on 26 Feb 2010 14:26

> It does. The point is that the system is set up to limit the bad
> consequences. You might (will) get wrong query answers, but the
> heap data won't get corrupted.
>
>
Again Tom, if there is an update based on index scan, then it takes the
tupleid and updates the wrong heap data right?
The only difference between normal index and thick index is to reach back to
the same index tuple to update the snapshot. How will that corrupt the heap
data? Did you intend to say that it corrupts the index data?

Thanks,
Gokul.

From: Greg Stark on 26 Feb 2010 14:59

On Fri, Feb 26, 2010 at 6:30 PM, Gokulakannan Somasundaram
<gokul007(a)gmail.com> wrote:
> http://archives.postgresql.org/pgsql-hackers/2008-03/msg00682.php
> I think, the buy-in became difficult because of the code quality.
>

Er, yeah. That's something we need to work on a bit. You should
probably expect your first few attempts to just be completely wrong.
Tom did give a very brief hint what was wrong with the patch but it
wasn't a point by point howto either.

It looks like your patch was unnecessarily complex.
slot_deform_tuple/heap_deform_tuple should handle missing columns
automatically already so they shouldn't need any modification.

All you need to do is check in heap_form_tuple whether there's a block
of nulls at the end and trim them off. If you can do this in a
cpu-efficient way it would be valuable because this is a very critical
path in the code.

Tom's concerns about benchmarking are interesting but I'm not sure
there's much we can do. We're talking about spending cpu time for
space gains which is usually worthwhile. I guess the best to hope for
is that on any macro benchmark there's no measurable performance
penalty even with a lot of nulls at the end of a very narrow row. Or
that in a microbenchmark there's a negligable penalty, perhaps under
10% for trimming 100+ trailing null columns.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Gokulakannan Somasundaram on 26 Feb 2010 16:01

> It does. The point is that the system is set up to limit the bad
>> consequences. You might (will) get wrong query answers, but the
>> heap data won't get corrupted.
>>
>>
> Tom,
if this is our goal - *"can return wrong query answers, but
should not corrupt the heap data.*" and if we make Thick indexes capable of
that, can i consider that as a thumbs up from your side? As you may already
know, this will only happen when there is a volatile function based index.

Heikki,
Please let me know, if you feel otherwise.

Thanks,
Gokul.

From: Tom Lane on 26 Feb 2010 16:30

Gokulakannan Somasundaram <gokul007(a)gmail.com> writes:
>> It does. The point is that the system is set up to limit the bad
>> consequences. You might (will) get wrong query answers, but the
>> heap data won't get corrupted.
>>
> Again Tom, if there is an update based on index scan, then it takes the
> tupleid and updates the wrong heap data right?

No, what generally happens is it fails to find a matching index entry at
all, because the search algorithm concludes there can be no match based
on the limited set of comparisons it's done. Transitivity failures lead
to searching the wrong subset of the index.

The case you're thinking about could arise if VACUUM failed to clean out
an index entry; after some unrelated tuple is inserted at the
just-cleared TID, searches finding that index entry would mistakenly
process the new tuple. This is why we insist on VACUUM not assuming
very much about the consistency of the index.

It's also a problem for thick indexes, because if you try to do a normal
index search for the index tuple to update its copy of the tuple
xmin/xmax data, you might fail to find it --- but that doesn't mean it's
not there.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 10 11 12 13 14 15 16 17 18 19 20 21 22
Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm