From: Florian Pflug on
On May 21, 2010, at 23:57 , Josh Berkus wrote:
> From a discussion at dinner at pgcon, I wanted to send this to the list for people to poke holes in it:
>
> Problem: currently, if your database has a large amount of "cold" data, such as 350GB of 3-year-old sales transactions, in 8.4 vacuum no longer needs to touch it thanks to the visibility map. However, every freeze_age transactions, very old pages need to be sucked into memory and rewritten just in order to freeze those pages. This can have a huge impact on system performance, and seems unjustified because the pages are not actually being used.
>
> Suggested resolution: we would add a 4-byte field to the *page* header which would track the XID wraparound count. Any page whose wraparound count was not equal to the current one would be considered to have all frozen tuples. This would remove the necessity to read and write old pages just to freeze them, a humongous gain for databases with long data retention horizons, let alone data warehouses.

If I understand this correctly, VACUUM usually only frees old tuples, but never increases the oldest xid in the pg_class record. Once that value becomes older than freeze_age, VACUUM needs to scan the whole relation to freeze old tuples. That results in most of the pages being marked dirty and subsequently being written out, causing an IO storm. If, OTOH, the wraparound count was stored in the page header, VACUUM would still need to read those pages, but wouldn't need to write them out.

Alternatively, VACUUM could freeze a few pages on each run, even if the xids are below freeze_age. It could pick those pages randomly, or maybe even prefer pages whose tuples have older xmin/xmas values. That would spread the load out more evenly, much like we try to spread checkpoints out over the whole checkpoint interval.

best regards,
Florian Pflugi


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Josh Berkus <josh(a)agliodbs.com> writes:
> From a discussion at dinner at pgcon, I wanted to send this to the list
> for people to poke holes in it:

Somebody (I think Joe or Heikki) poked a big hole in this last night at
the Royal Oak. Although the scheme would get rid of the need to replace
old XIDs with FrozenXid, it does not get rid of the need to set hint
bits before you can truncate CLOG. So in your example of an insert-only
table that's probably never read again, there's still a minimum of one
update visit required on every old page. Now that's still better than
two update visits ... but we could manage that already, just by tweaking
vacuum's heuristics about when to freeze vs when to set hint bits.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on

> Somebody (I think Joe or Heikki) poked a big hole in this last night at
> the Royal Oak. Although the scheme would get rid of the need to replace
> old XIDs with FrozenXid, it does not get rid of the need to set hint
> bits before you can truncate CLOG. So in your example of an insert-only
> table that's probably never read again, there's still a minimum of one
> update visit required on every old page. Now that's still better than
> two update visits ... but we could manage that already, just by tweaking
> vacuum's heuristics about when to freeze vs when to set hint bits.

Yeah, someone pointed that out to me too and suggested that a freeze map
was the better solution. I still think there's something we can do with
pages on the visibility map but I'll have to think about it some more.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Josh Berkus <josh(a)agliodbs.com> writes:
>> Somebody (I think Joe or Heikki) poked a big hole in this last night at
>> the Royal Oak. Although the scheme would get rid of the need to replace
>> old XIDs with FrozenXid, it does not get rid of the need to set hint
>> bits before you can truncate CLOG. So in your example of an insert-only
>> table that's probably never read again, there's still a minimum of one
>> update visit required on every old page. Now that's still better than
>> two update visits ... but we could manage that already, just by tweaking
>> vacuum's heuristics about when to freeze vs when to set hint bits.

> Yeah, someone pointed that out to me too and suggested that a freeze map
> was the better solution. I still think there's something we can do with
> pages on the visibility map but I'll have to think about it some more.

It occurred to me on the flight home that maybe we could salvage
something from this if there were some mechanism that caused hint bits
to get set before the page got written out from shared buffers the first
time. This assumes that you have enough slack in shared-buffer space
that the transactions that touched a particular page all commit or abort
before the page first gets flushed to disk.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jan Wieck on
On 5/22/2010 9:16 PM, Tom Lane wrote:
> Josh Berkus <josh(a)agliodbs.com> writes:
>>> Somebody (I think Joe or Heikki) poked a big hole in this last night at
>>> the Royal Oak. Although the scheme would get rid of the need to replace
>>> old XIDs with FrozenXid, it does not get rid of the need to set hint
>>> bits before you can truncate CLOG. So in your example of an insert-only
>>> table that's probably never read again, there's still a minimum of one
>>> update visit required on every old page. Now that's still better than
>>> two update visits ... but we could manage that already, just by tweaking
>>> vacuum's heuristics about when to freeze vs when to set hint bits.
>
>> Yeah, someone pointed that out to me too and suggested that a freeze map
>> was the better solution. I still think there's something we can do with
>> pages on the visibility map but I'll have to think about it some more.
>
> It occurred to me on the flight home that maybe we could salvage
> something from this if there were some mechanism that caused hint bits
> to get set before the page got written out from shared buffers the first
> time. This assumes that you have enough slack in shared-buffer space
> that the transactions that touched a particular page all commit or abort
> before the page first gets flushed to disk.

At least the background writer should have a few spare cycles to look
over a "to be flushed" page before writing it.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers