Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm
From: "Kevin Grittner" on 24 Feb 2010 10:46 Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > The fundamental point IMHO is that indexes are more complex and > much more fragile than heaps. This is obviously true > theoretically and we have years of experience that proves it to be > true in the field as well. Broken comparison functions are just > one of the possible hazards; there are many others. > > Now with standard indexes you can always recover from any problem > via REINDEX; no matter how badly the index is messed up, your data > is still there and not damaged. (Well, maybe it will fail a > unique constraint check or something, but at least it's still > there.) > > With an IOT I don't understand how you get out of index corruption > without data loss. That's a showstopper for practical use, I > think. Having used the IOT implementation ("clustered indexes") in SQL Server and then Sybase ASE starting with SQL Server 1.0, I can relate my experiences on that. In about 18 years with over 100 databases we had maybe five or ten times that such damage made it difficult to recover data -- typically the result of hardware problems. This implementation had a double linked list of pointers through the leaf level pages, so normally a query which generated a full table scan would follow these and work. When said pointers were damaged we would query through the index tree to see what we could reach. There were usually other indexes on tables, which would give us other paths to the data in these leaf pages. It was sometimes necessary to subdivide a range in which we were getting an error to find the "edges" of the damaged area. There were sometimes small areas we could not reach, for which we had to look to backups or source documents. There's clearly no database technology which guarantees you will never have to do that in the face of a hardware failure. To the extent that such a technique reduces the redundant storage of values, it clearly affects recovery options. All in all, I suspect that it would be underrating the talent pool available for PostgreSQL development to say we can't get to a feature which SQL server had in version 1.0 and maintains through their conversion to MVCC. Where it fits in the scheme of priorities and cost/benefit is certainly a valid question. It does provide significant benefits for some use cases, but it's certainly not trivial to implement. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Stark on 24 Feb 2010 10:52 On Wed, Feb 24, 2010 at 3:12 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > With an IOT I don't understand how you get out of index corruption > without data loss. That's a showstopper for practical use, I think. That doesn't seem insurmountable to me. You could always allow a REINDEX to scan the index sequentially, ignoring any index structure, just using the tuples it finds. However it seems to me this discussion has several only barely related issues being covered. 1) transaction information in index This seems like a lot of bloat in indexes. It also means breaking a lot of other optimizations such as being able to read the tuples directly from the heap page without locking. I'm not sure how much those are worth though. But adding 24 bytes to every index entry seems pretty unlikely to be a win anyways. 2) Index organized tables This seems like a non-starter to me. We would lose the option of doing sequential scans and the ability to have any other indexes on the table. That would be comparable to Oracle circa 1985. We can do better with stuff like Heikki's "grouped index tuple" and the visibility map which don't interfere with other features as much. 3) Depending on refinding keys in the index for basic operatoin Currently if your index procedure/operator is ill-behaved then your index searches might fail to return matching keys. But vacuum will work correctly and you will never have an index pointer pointing to a dead tuple or a tuple different from the one that was originally inserted. Things like "retail vacuum" were proposed in the past but were rejected because the consequences of an incorrect index procedure become much worse. You could get dangling index pointers pointing to nonexistent tuples or even pointing to new tuples that have been inserted into the same slot later. I don't think these three are actually related. Afaict neither IOT nor visibility information in indexes depend on refinding keys in the index. But it's possible I'm missing something. Even so they're still three separate issues. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Gokulakannan Somasundaram on 24 Feb 2010 11:05 > > > I think you're a barking up the wrong tree. AFAIUI, the need for the > visibility map has not very much to do with whether the table has > indices, and everything to do with avoiding unnecessary VACUUMs. In > any event, you've not shown that the visibility map HAS any overhead, > so talking about skipping it seems entirely premature. Keep in mind > that the visibility map is quite small. > OK! i am not saying to remove the visibility map, if i am misunderstood. All i am saying here is to remove the index only scan processing of visibility map. If it is being used only for vacuums, you need not make it crash safe and no WAL comes into picture. > > The point of the visibility map as far as index-only scans are > concerned is that if all the needed column values can be extracted > from the index, we still need to read the heap page to check tuple > visibility - unless, of course, we already know from the visibility > map that all the tuples on that heap page are guaranteed to be visible > to all transactions. On a read-only or read-mostly table, this will > reduce the cost of checking tuple visibility by several orders of > magnitude. > > I understand that. As i suggested above, if you have no indexes for a table, why do you need to spend the extra effort in making it crash safe for that table? Hope i am clear. Thanks, Gokul.
From: Gokulakannan Somasundaram on 24 Feb 2010 11:12 > > But adding 24 bytes to every index entry seems > pretty unlikely to be a win anyways. > We actually wanted to make it optional. Not every index will be like that. More than that we can make it into 16 bytes. Only commands within the same transaction will not be able to do a index only scan. > This seems like a non-starter to me. We would lose the option of > doing sequential scans and the ability to have any other indexes on > the table. That would be comparable to Oracle circa 1985. We can do > better with stuff like Heikki's "grouped index tuple" and the > visibility map which don't interfere with other features as much. > Sequential scans can be done on IOTs, just scan through the leaf pages. I think you are talking about IOTs with overflow regions. As i said already, this serves a different set of options to the DB Designer. > > > I don't think these three are actually related. Afaict neither IOT nor > visibility information in indexes depend on refinding keys in the > index. But it's possible I'm missing something. Even so they're still > three separate issues. > > If we have visibility information in a heap, we need to goto the same index tuple, whenever there is a update/delete. Now if there is a broken function, it won't let us reach the index from the heap tuple . Hope you are able to get it. Thanks, Gokul.
From: Gokulakannan Somasundaram on 24 Feb 2010 11:16
> > > With an IOT I don't understand how you get out of index corruption > without data loss. That's a showstopper for practical use, I think. > For simplicity, say we are storing all the non-leaf pages of the index in a seperate file, then the leaf pages are nothing but the table. So if we can replicate the table, then we can replicate the non-leaf pages (say by some modified version of reindex). Thanks, Gokul. |