A thought on Index Organized Tables [PgSql]

Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm

From: Heikki Linnakangas on 24 Feb 2010 05:05

Gokulakannan Somasundaram wrote:
> While we accept that visibility map is good for read only application, why
> can't we make it optional? Atleast if there is a way for a person to drop
> the visibility map for a table(if it gets created by default), the
> application need not incur the overhead for those tables, when it knows it
> is update intensive / with batch jobs.

If you have a scenario where the visibility map incurs a measurable
overhead, let's hear it. I didn't see any in the tests I performed, but
it's certainly possible that if the circumstances are just right it
makes a difference.

> Again not to deviate from my initial question, can we make a decision
> regarding unstable/mutable functions / broken data types ?

*Sigh*. Yes. You need to deal with them.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Gokulakannan Somasundaram on 24 Feb 2010 07:09

Forgot to include the group..

On Wed, Feb 24, 2010 at 5:38 PM, Gokulakannan Somasundaram <
gokul007(a)gmail.com> wrote:

> I am not familiar with this term "broken data types", and I just looked for
>> it in the source code and couldn't find it.
>>
>> What exactly are you referring to?
>>
>> cheers
>>
>> andrew
>>
>
> Sorry i missed this. Actually if we create a function A which uses
> functions like time(), date() and random(), then this function A won't give
> the same output, even if we give the same input. So if a person has created
> a data type, which uses these functions, then it can't be made as a primary
> key in an Index organized table, because i need to reach the same tuple by
> applying the function on the supplied values. But since the function is
> mutable, we can't reach the same tuple.
>
> If we decide to support only datatypes containing immutable functions, then
> there might be people who have created these kind of functions and marked it
> as immutable( while they are mutable functions). So those functions will
> result in index-corruption / failed operation. Only if we resolve this issue
> we can have data structures like IOT.
>
> Hope, i was clear.
>
> Thanks,
> Gokul.
>
>

From: Gokulakannan Somasundaram on 24 Feb 2010 09:41

>
>
> If you have a scenario where the visibility map incurs a measurable
> overhead, let's hear it. I didn't see any in the tests I performed, but
> it's certainly possible that if the circumstances are just right it
> makes a difference.
>
> Heikki,
The obvious one, i could observe is that it would increase the WAL
contention. Am i missing something? All i am suggesting is to reduce the
unnecessary work required in those tables, where the visibility map is not
required. For example, in data warehouses, people might even have a tables
without any indexes. Why do we ask them to incur the overhead of visibility
map?
Also since you have made the visibility maps without any page
level locking, have you considered whether it would make sure the correct
order of inserts into the WAL? i have looked at some random threads, but i
couldn't get the complete design of visibility map to be used for index only
scans.

Thanks,
Gokul.

From: Robert Haas on 24 Feb 2010 10:01

On Wed, Feb 24, 2010 at 9:41 AM, Gokulakannan Somasundaram
<gokul007(a)gmail.com> wrote:
>>
>> If you have a scenario where the visibility map incurs a measurable
>> overhead, let's hear it. I didn't see any in the tests I performed, but
>> it's certainly possible that if the circumstances are just right it
>> makes a difference.
>>
> Heikki,
> The obvious one, i could observe is that it would increase the WAL
> contention. Am i missing something? All i am suggesting is to reduce the
> unnecessary work required in those tables, where the visibility map is not
> required. For example, in data warehouses, people might even have a tables
> without any indexes. Why do we ask them to incur the overhead of visibility
> map?

I think you're a barking up the wrong tree. AFAIUI, the need for the
visibility map has not very much to do with whether the table has
indices, and everything to do with avoiding unnecessary VACUUMs. In
any event, you've not shown that the visibility map HAS any overhead,
so talking about skipping it seems entirely premature. Keep in mind
that the visibility map is quite small.

The point of the visibility map as far as index-only scans are
concerned is that if all the needed column values can be extracted
from the index, we still need to read the heap page to check tuple
visibility - unless, of course, we already know from the visibility
map that all the tuples on that heap page are guaranteed to be visible
to all transactions. On a read-only or read-mostly table, this will
reduce the cost of checking tuple visibility by several orders of
magnitude.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 24 Feb 2010 10:12

Karl Schnaitter <karlsch(a)gmail.com> writes:
> On Wed, Feb 24, 2010 at 12:53 AM, Gokulakannan Somasundaram <
> gokul007(a)gmail.com> wrote:
>> Again not to deviate from my initial question, can we make a decision
>> regarding unstable/mutable functions / broken data types ?
>>
> I second this question. A year or two ago, Gokul and I both proposed a
> feature that put visibility metadata into the index tuples and supported
> index-only scans, and the idea was dismissed because a user might choose
> incorrect ordering operators. I tried to ask for a clear explanation of the
> issue, but never got it.

The fundamental point IMHO is that indexes are more complex and much
more fragile than heaps. This is obviously true theoretically and we
have years of experience that proves it to be true in the field as well.
Broken comparison functions are just one of the possible hazards; there
are many others.

Now with standard indexes you can always recover from any problem via
REINDEX; no matter how badly the index is messed up, your data is still
there and not damaged. (Well, maybe it will fail a unique constraint
check or something, but at least it's still there.)

With an IOT I don't understand how you get out of index corruption
without data loss. That's a showstopper for practical use, I think.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm