Idea for getting rid of VACUUM FREEZE on cold pages [PgSql]

Prev: [HACKERS] beta testing - planner bug - ERROR: XX000: failed to build any 2-way joins
Next: [HACKERS] beta testing - pg_upgrade bug fix - double free

From: Tom Lane on 8 Jun 2010 18:35

Simon Riggs <simon(a)2ndQuadrant.com> writes:
> On Tue, 2010-06-08 at 18:03 -0400, Robert Haas wrote:
>> OK, yes, I see what you're getting at now. There are two possible
>> ways to do freeze the tuples and keep the xmin: we can either rely on
>> the PD_ALL_VISIBLE page-level bit (as I previously proposed) or we can
>> additionally have a HEAP_XMIN_FROZEN bit as you propose here. I am
>> not sure which way is better.

> Doing it at tuple level is more flexible and allows more aggressive
> freezing. It also works better with existing tuple visibility code.

I agree, relying on a page-level bit (or field) is unpleasant in a
number of ways.

But none of this accomplishes a damn thing towards the original goal,
which was to avoid an extra disk write associated with freezing (not
to mention an extra write for setting the transaction-committed hint
bit). Setting a bit is no cheaper from that standpoint than changing
the xmin field.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on 10 Jun 2010 17:58

On Thu, Jun 3, 2010 at 11:41 AM, Greg Stark <gsstark(a)mit.edu> wrote:
> I think to make it work you need to store a whole 64-bit reference
> transaction id consisting of both a cycle counter and a transaction
> id. The invariant for the page is that every xid on the page can be
> compared to that reference transaction id using normal transactionid
> semantics. Actually I think the easiest way to do that is to set it to
> the oldest xid on the page. The first thing to do before comparing any
> transaction id on the page with a real transaction id would be to
> figure out whether the reference xid is comparable to the live xid,
> which if it's the oldest xid on the page implies they'll all be
> comparable.
>
> The way to maintain that invariant would be that any xid insertion on
> the page must advance the reference xid if it's not comparable to the
> newly inserted xid. It has to be advanced to the oldest xid that's
> still comparable with the newly inserted xid. Any xids on the page
> that are older than the new refernce xid have to be frozen or removed.
> I'm not sure how to do that without keeping clog forever though.

So the more I think about this the more I think it's unavoidable that
we would need to retain clog forever.

I think the goal here is to be able to load data into the database and
then never write the data ever again. Even if you visit the page years
later after the transaction ids have wrapped around several times. In
that case there's no avoiding that you'll need to know whether that
transaction committed or aborted.

Now we could make a bet that most transactions commit and therefore we
could keep a list of aborted transactions only which we might be able
to keep "forever" in very little space if very few transactions abort.
Presumably we would only use this form once the transaction was about
to be truncated out of clog. I'm not too happy with the assumption
that there aren't many aborts though. Someone could come along with a
use case where they have lots of aborts and run into strange
limitations and performance characteristics.

Alternatively we could do something like keeping a list of tables
touched by any transaction. Then vacuum could look for any
non-committed transactions old enough to be in danger of aging out of
clog and ensure those tables are frozen. But any tables which have
never been touched by any such old transaction could be left alone.
when we read in the page we'll be able to recognize the old
transactions as committed if they're beyond the end of the clog
horizon.

I don't really like that idea either because it leaves performance
really quite unpredictable. I could have a large table that goes
unvacuumed for a long time -- then when I come along with some tiny
query where I hit C-c and cause an abort I suddenly set a trap which
causes a huge vacuum freeze to fire off.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jan Wieck on 10 Jun 2010 15:04

Seems I underestimated the importance of forensic breadcrumbs.

On 6/9/2010 12:09 PM, Tom Lane wrote:
> I do like the idea of using a status bit rather than FrozenXid to mark a
> frozen tuple, because that eliminates the conflict between wanting to
> freeze aggressively for performance reasons and wanting to preserve Xids
> for forensic reasons. But it doesn't seem to do much for Josh's
> original problem.

Depends. Specifically on transaction profiles and how long the blocks
linger around before being written. If you can set the all visible bit
by the time, the page is written the first time, what bit including the
is-frozen one cannot be set at that time too?

Maybe some analysis on the typical behavior of such system is in order.
Especially the case Josh was mentioning seems to be a typical single
insert logging style application, with little else going on on that
particular database. I can't reveal specifics about that particular
case, but think of something like taking frequent sensor readings, that
need to be kept for years for forensics in case there is a product
recall some day.

And even if some cases still required another page write because those
frozen bits cannot be set on first write, this seems to be a win-win. We
would get rid of the FrozenXid completely and shift to a bit, so we can
effectively have a min_ freeze_age of zero while keeping the xid's forever.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 12 Jun 2010 12:54

Bruce Momjian <bruce(a)momjian.us> writes:
> I think we might need two bits, one commited and all visible, and
> another aborted and all vislble.

Huh? The latter means "vacuumable".

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev |
Pages: 4 5 6 7 8 9 10 11 12 13 14
Prev: [HACKERS] beta testing - planner bug - ERROR: XX000: failed to build any 2-way joins
Next: [HACKERS] beta testing - pg_upgrade bug fix - double free