From: Heikki Linnakangas on
On 14/06/10 06:08, Robert Haas wrote:
> visibilitymap.c begins with a long and useful comment - but this part
> seems to have a bit of split personality disorder.
>
> * Currently, the visibility map is not 100% correct all the time.
> * During updates, the bit in the visibility map is cleared after releasing
> * the lock on the heap page. During the window between releasing the lock
> * and clearing the bit in the visibility map, the bit in the visibility map
> * is set, but the new insertion or deletion is not yet visible to other
> * backends.
> *
> * That might actually be OK for the index scans, though. The newly inserted
> * tuple wouldn't have an index pointer yet, so all tuples reachable from an
> * index would still be visible to all other backends, and deletions wouldn't
> * be visible to other backends yet. (But HOT breaks that argument, no?)
>
> I believe that the answer to the parenthesized question here is "yes"
> (in which case we might want to just delete this paragraph).

A HOT update can only update non-indexed columns, so I think we're still
OK with HOT. To an index-only scan, it doesn't matter which tuple in a
HOT update chain you consider as live, because they both must all the
same value in the indexed columns. Subtle..

> * There's another hole in the way the PD_ALL_VISIBLE flag is set. When
> * vacuum observes that all tuples are visible to all, it sets the flag on
> * the heap page, and also sets the bit in the visibility map. If we then
> * crash, and only the visibility map page was flushed to disk, we'll have
> * a bit set in the visibility map, but the corresponding flag on the heap
> * page is not set. If the heap page is then updated, the updater won't
> * know to clear the bit in the visibility map. (Isn't that prevented by
> * the LSN interlock?)
>
> I *think* that the answer to this parenthesized question is "no".
> When we vacuum a page, we set the LSN on both the heap page and the
> visibility map page. Therefore, neither of them can get written to
> disk until the WAL record is flushed, but they could get flushed in
> either order. So the visibility map page could get flushed before the
> heap page, as the non-parenthesized portion of the comment indicates.

Right.

> However, at least in theory, it seems like we could fix this up during
> redo.

Setting a bit in the visibility map is currently not WAL-logged, but yes
once we add WAL-logging, that's straightforward to fix.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers