page corruption on 8.3+ that makes it to standby [PgSql]

Prev: ALTER TABLE ... DISABLE TRIGGER vs. AccessExclusiveLock
Next: [HACKERS] Toward a column reorder solution

From: Jeff Davis on 28 Jul 2010 13:18

On Wed, 2010-07-28 at 12:36 -0400, Tom Lane wrote:
> Jeff Davis <pgsql(a)j-davis.com> writes:
> > However, when Simon said "We definitely shouldn't do anything that
> > leaves standby different to primary." you said "obviously". Fix2 can
> > leave a difference between the two, because zeroed pages at the end of
> > the heap file on the primary will not be sent to the standby (the
> > standby will only create the zeroed pages if a higher block number is
> > sent; which won't be the case if the zeroed pages are at the end).
>
> > As we discussed before, that looks inconsequential, but I just want to
> > make sure that it's understood.
>
> I understand it, and I don't like it one bit. I haven't caught up on
> this thread yet, but I think the only acceptable solution is one that
> leaves the slave in the *same* state as the master. Not a state that
> we hope will behave equivalently. I can think of too many corner cases
> where it might not. (In fact, having a zeroed page in a relation is
> already a corner case in itself, so the amount of testing you'd get for
> such behaviors is epsilon squared. You don't want to take that bet.)
>

Ok, sounds like my original fix (fix1) is the way to go then. Log zero
pages, but don't set LSN/TLI if it's a zero page (in log_newpage or
heap_xlog_newpage).

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 28 Jul 2010 13:18

On Wed, Jul 28, 2010 at 12:36 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Jeff Davis <pgsql(a)j-davis.com> writes:
>> However, when Simon said "We definitely shouldn't do anything that
>> leaves standby different to primary." you said "obviously". Fix2 can
>> leave a difference between the two, because zeroed pages at the end of
>> the heap file on the primary will not be sent to the standby (the
>> standby will only create the zeroed pages if a higher block number is
>> sent; which won't be the case if the zeroed pages are at the end).
>
>> As we discussed before, that looks inconsequential, but I just want to
>> make sure that it's understood.
>
> I understand it, and I don't like it one bit. �I haven't caught up on
> this thread yet, but I think the only acceptable solution is one that
> leaves the slave in the *same* state as the master. �Not a state that
> we hope will behave equivalently. �I can think of too many corner cases
> where it might not. �(In fact, having a zeroed page in a relation is
> already a corner case in itself, so the amount of testing you'd get for
> such behaviors is epsilon squared. �You don't want to take that bet.)

I might be missing something here, but I don't see how you're going to
manage that. In Jeff's original example, he crashes the database
after extending the relation but before initializing and writing the
new page. I believe that at that point no XLOG has been written yet,
so the relation has been extended but there is no WAL to be sent to
the standby. So now you have the exact situation you're concerned
about - the relation has been extended on the master but not on the
standby. As far as I can see, this is an unavoidable consequence of
the fact that we don't XLOG the act of extending the relation.
Worrying about it only in the specific context of ALTER TABLE .. SET
TABLESPACE seems backwards; if there are any bugs there, we're in for
it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jeff Davis on 28 Jul 2010 13:28

On Wed, 2010-07-28 at 13:18 -0400, Robert Haas wrote:
> In Jeff's original example, he crashes the database
> after extending the relation but before initializing and writing the
> new page. I believe that at that point no XLOG has been written yet,
> so the relation has been extended but there is no WAL to be sent to
> the standby. So now you have the exact situation you're concerned
> about - the relation has been extended on the master but not on the
> standby. As far as I can see, this is an unavoidable consequence of
> the fact that we don't XLOG the act of extending the relation.
> Worrying about it only in the specific context of ALTER TABLE .. SET
> TABLESPACE seems backwards; if there are any bugs there, we're in for
> it.

That's a very good point. Now I'm leaning more toward your fix.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 28 Jul 2010 14:21

Robert Haas <robertmhaas(a)gmail.com> writes:
> On Wed, Jul 28, 2010 at 12:36 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> I understand it, and I don't like it one bit. �I haven't caught up on
>> this thread yet, but I think the only acceptable solution is one that
>> leaves the slave in the *same* state as the master.

> I might be missing something here, but I don't see how you're going to
> manage that. In Jeff's original example, he crashes the database
> after extending the relation but before initializing and writing the
> new page. I believe that at that point no XLOG has been written yet,
> so the relation has been extended but there is no WAL to be sent to
> the standby. So now you have the exact situation you're concerned
> about - the relation has been extended on the master but not on the
> standby.

You're right that we cannot prevent that situation --- or at least,
the cure would be worse than the disease. (The cure would be to
XLOG the extension action, obviously, but then out-of-disk-space
has to be a PANIC condition.) However, it doesn't follow that it's
a good idea to make copy_relation_data *intentionally* make the slave
and master different.

I've caught up on the thread now, and I think that fix2 (skip logging
the page) is extremely dangerous and has little if anything in its
favor. fix1 seems reasonable given the structure of the page validity
checks.

However, what about Jeff's original comment

: On second thought, why are PageSetLSN and PageSetTLI being called from
: log_newpage(), anyway?

I think it is appropriate to be setting the LSN/TLI in the case of a
page that's been constructed by the caller as part of the WAL-logged
action, but doing so in copy_relation_data seems rather questionable.
We certainly didn't change the source page so changing its LSN seems
rather wrong --- wouldn't it be better to just copy the source pages
with their original LSNs? So perhaps the best fix is to add a bool
parameter to log_newpage telling it whether to update LSN/TLI, and
have copy_relation_data pass false while the other callers pass true.
(Although I guess we'd need to propagate that flag in the WAL record,
so maybe this is more trouble than its worth.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 28 Jul 2010 14:50

On Wed, Jul 28, 2010 at 2:21 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas(a)gmail.com> writes:
>> On Wed, Jul 28, 2010 at 12:36 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>>> I understand it, and I don't like it one bit. �I haven't caught up on
>>> this thread yet, but I think the only acceptable solution is one that
>>> leaves the slave in the *same* state as the master.
>
>> I might be missing something here, but I don't see how you're going to
>> manage that. �In Jeff's original example, he crashes the database
>> after extending the relation but before initializing and writing the
>> new page. �I believe that at that point no XLOG has been written yet,
>> so the relation has been extended but there is no WAL to be sent to
>> the standby. �So now you have the exact situation you're concerned
>> about - the relation has been extended on the master but not on the
>> standby.
>
> You're right that we cannot prevent that situation --- or at least,
> the cure would be worse than the disease. �(The cure would be to
> XLOG the extension action, obviously, but then out-of-disk-space
> has to be a PANIC condition.)

Not to mention that performance would probably be atrocious.

> However, it doesn't follow that it's
> a good idea to make copy_relation_data *intentionally* make the slave
> and master different.
>
> I've caught up on the thread now, and I think that fix2 (skip logging
> the page) is extremely dangerous and has little if anything in its
> favor.

Why do you think that? They will be different only in terms of
whether the uninitialized bytes are before or after the nominal EOF,
and we know we have to be indifferent to that case anyway.

> fix1 seems reasonable given the structure of the page validity
> checks.
>
> However, what about Jeff's original comment
>
> : On second thought, why are PageSetLSN and PageSetTLI being called from
> : log_newpage(), anyway?
>
> I think it is appropriate to be setting the LSN/TLI in the case of a
> page that's been constructed by the caller as part of the WAL-logged
> action, but doing so in copy_relation_data seems rather questionable.
> We certainly didn't change the source page so changing its LSN seems
> rather wrong --- wouldn't it be better to just copy the source pages
> with their original LSNs? �So perhaps the best fix is to add a bool
> parameter to log_newpage telling it whether to update LSN/TLI, and
> have copy_relation_data pass false while the other callers pass true.
> (Although I guess we'd need to propagate that flag in the WAL record,
> so maybe this is more trouble than its worth.)

It seems like if log_newpage() were to set the LSN/TLI before calling
XLogInsert() - or optionally not - then it wouldn't be necessary to
set them also in heap_xlog_newpage(); the memcpy operation would by
definition have copied the right information onto the page. That
seems like it would be a cleaner design, but back-patching a change to
the interpretation of WAL records that might already be on someone's
disk seems dicey at best.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: ALTER TABLE ... DISABLE TRIGGER vs. AccessExclusiveLock
Next: [HACKERS] Toward a column reorder solution