page corruption on 8.3+ that makes it to standby [PgSql]

Prev: ALTER TABLE ... DISABLE TRIGGER vs. AccessExclusiveLock
Next: [HACKERS] Toward a column reorder solution

From: Jeff Davis on 28 Jul 2010 15:08

On Wed, 2010-07-28 at 14:50 -0400, Robert Haas wrote:
> It seems like if log_newpage() were to set the LSN/TLI before calling
> XLogInsert() - or optionally not - then it wouldn't be necessary to
> set them also in heap_xlog_newpage(); the memcpy operation would by
> definition have copied the right information onto the page. That
> seems like it would be a cleaner design, but back-patching a change to
> the interpretation of WAL records that might already be on someone's
> disk seems dicey at best.

How do you set the LSN before XLogInsert()?

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 28 Jul 2010 15:09

On Wed, Jul 28, 2010 at 3:08 PM, Jeff Davis <pgsql(a)j-davis.com> wrote:
> On Wed, 2010-07-28 at 14:50 -0400, Robert Haas wrote:
>> It seems like if log_newpage() were to set the LSN/TLI before calling
>> XLogInsert() - or optionally not - then it wouldn't be necessary to
>> set them also in heap_xlog_newpage(); the memcpy operation would by
>> definition have copied the right information onto the page. �That
>> seems like it would be a cleaner design, but back-patching a change to
>> the interpretation of WAL records that might already be on someone's
>> disk seems dicey at best.
>
> How do you set the LSN before XLogInsert()?

Details, details...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 28 Jul 2010 15:16

Robert Haas <robertmhaas(a)gmail.com> writes:
> On Wed, Jul 28, 2010 at 2:21 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> I've caught up on the thread now, and I think that fix2 (skip logging
>> the page) is extremely dangerous and has little if anything in its
>> favor.

> Why do you think that? They will be different only in terms of
> whether the uninitialized bytes are before or after the nominal EOF,
> and we know we have to be indifferent to that case anyway.

(1) You're assuming that the page will be zeroes on the slave without
having forced it to be so. A really obvious case where this fails
is where we're doing crash-and-restart on the master: a later action
could have modified the page away from the all-zero state. (In
principle that's OK but I think this might break torn-page protection.)

(2) On filesystems that support holes, the page will not have storage,
whereas it (probably) does on the master. This could lead to a
divergence in behavior later, ie slave runs out of disk space at a
different point than the master.

(3) The position of the nominal EOF can drive choices about which page
to put new tuples in, specifically thats where RelationGetBufferForTuple
will go if FSM has no information. This could result in unexpected
divergence in behavior after the slave goes live compared to what the
master would have done. Maybe that's OK but it seems better to avoid
it if we can, especially when you think about crash-and-restart on the
master as opposed to a separate slave.

Now as I said earlier, these are all tiny corners of a corner case, and
they *probably* shouldn't matter. But I see no good reason to expose
ourselves to the possibility that there's some cases where they do
matter. Especially when your argument for fix2 is a purely aesthetic
judgment that I don't agree with anyway.

>> I think it is appropriate to be setting the LSN/TLI in the case of a
>> page that's been constructed by the caller as part of the WAL-logged
>> action, but doing so in copy_relation_data seems rather questionable.
>> We certainly didn't change the source page so changing its LSN seems
>> rather wrong --- wouldn't it be better to just copy the source pages
>> with their original LSNs?

> It seems like if log_newpage() were to set the LSN/TLI before calling
> XLogInsert() - or optionally not - then it wouldn't be necessary to
> set them also in heap_xlog_newpage(); the memcpy operation would by
> definition have copied the right information onto the page.

Not possible because it is only after you've done XLogInsert that you
know what LSN was assigned to the WAL record.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 28 Jul 2010 15:37

I wrote:
>>> I think it is appropriate to be setting the LSN/TLI in the case of a
>>> page that's been constructed by the caller as part of the WAL-logged
>>> action, but doing so in copy_relation_data seems rather questionable.

BTW, I thought of an argument that explains why that's sane: it marks
the copied page as having been recently WAL-logged. If we do some
action on the copied relation shortly after completing the
copy_relation_data transaction, we will see that its LSN is later than
the last checkpoint and know that we don't need to emit a full-page WAL
image for it, which is correct because in case of crash+restart the
HEAP_NEWPAGE record will provide the full-page image. If we left the
source relation's page's LSN in there, we would frequently make the
wrong decision and emit an unnecessary extra full-page image.

So nevermind that distraction. I'm back to thinking that fix1 is
the way to go.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 28 Jul 2010 15:49

On Wed, Jul 28, 2010 at 3:16 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> (1) You're assuming that the page will be zeroes on the slave without
> having forced it to be so. �A really obvious case where this fails
> is where we're doing crash-and-restart on the master: a later action
> could have modified the page away from the all-zero state. �(In
> principle that's OK but I think this might break torn-page protection.)

Hmm, yeah, that does seem like it has the potential to be bad. I
think this is sufficient reason to go with fix #1.

> (2) On filesystems that support holes, the page will not have storage,
> whereas it (probably) does on the master. �This could lead to a
> divergence in behavior later, ie slave runs out of disk space at a
> different point than the master.

I can't get excited about this one.

> (3) The position of the nominal EOF can drive choices about which page
> to put new tuples in, specifically thats where RelationGetBufferForTuple
> will go if FSM has no information. �This could result in unexpected
> divergence in behavior after the slave goes live compared to what the
> master would have done. �Maybe that's OK but it seems better to avoid
> it if we can, especially when you think about crash-and-restart on the
> master as opposed to a separate slave.

You're still going to have that in the "normal" (not altering the
tablespace) extension case, which is presumably far more common.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: ALTER TABLE ... DISABLE TRIGGER vs. AccessExclusiveLock
Next: [HACKERS] Toward a column reorder solution