page corruption on 8.3+ that makes it to standby [PgSql]

Prev: ALTER TABLE ... DISABLE TRIGGER vs. AccessExclusiveLock
Next: [HACKERS] Toward a column reorder solution

From: Robert Haas on 28 Jul 2010 06:40

On Wed, Jul 28, 2010 at 12:23 AM, Jeff Davis <pgsql(a)j-davis.com> wrote:
> On Tue, 2010-07-27 at 15:23 -0700, Jeff Davis wrote:
>> On Tue, 2010-07-27 at 17:18 -0400, Robert Haas wrote:
>> > > My first concern with that idea was that it may create an inconsistency
>> > > between the primary and the standby. The primary could have a bunch of
>> > > zero pages that never make it to the standby.
>> >
>> > Maybe I'm slow on the uptake here, but don't the pages start out
>> > all-zeroes on the standby just as they do on the primary? The only way
>> > it seems like this would be a problem is if a page that previously
>> > contained data on the primary was subsequently zeroed without writing
>> > a WAL record - or am I confused?
>>
>> The case I was concerned about is when you have a table on the primary
>> with a bunch of zero pages at the end. Then you SET TABLESPACE, and none
>> of the copied pages (or even the fact that they exist) would be sent to
>> the standby, but they would exist on the primary. And later pages may
>> have data, so the standby may see page N but not N-1.
>>
>> Generally, most of the code is not expecting to read or write past the
>> end of the file, unless it's doing an extension.
>>
>> However, I think everything is fine during recovery, because it looks
>> like it's designed to create zero pages as needed. So your idea seems
>> safe to me, although I do still have some doubts because of my lack of
>> knowledge in this area; particularly hot standby conflict
>> detection/resolution.
>>
>> My idea was different: still log the zero page, just don't set LSN or
>> TLI for a zero page in log_newpage() or heap_xlog_newpage(). This isn't
>> as clean as your idea, but I'm a little more confident that it is
>> correct.
>>
>
> Both potential fixes attached and both appear to work.
>
> fix1 -- Only call PageSetLSN/TLI inside log_newpage() and
> heap_xlog_newpage() if the page is not zeroed.
>
> fix2 -- Don't call log_newpage() at all if the page is not zeroed.
>
> Please review. I don't have a strong opinion about which one should be
> applied.

Looks good. I still prefer fix2; it seems cleaner to me. It has
another advantage, too, which is that copy_relation_data() is used
ONLY by ALTER TABLE SET TABLESPACE. So if I stick to patching that
function, that's the only thing I can possibly break, whereas
log_newpage() is used in a bunch of other places. I don't think
either fix is going to break anything at all, but considering that
this is going to need back-patching, I'd rather be conservative.

Speaking of back-patching, the subject line describes this as an 8.3+
problem, but it looks to me like this goes all the way back to 8.0.
The code is a little different in 8.2 and prior, but ISTM it's
vulnerable to the same issue. Don't we need a modified version of
this patch for the 8.0 - 8.2 branches?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 28 Jul 2010 07:02

On Tue, 2010-07-27 at 21:23 -0700, Jeff Davis wrote:

> Both potential fixes attached and both appear to work.
>
> fix1 -- Only call PageSetLSN/TLI inside log_newpage() and
> heap_xlog_newpage() if the page is not zeroed.
>
> fix2 -- Don't call log_newpage() at all if the page is not zeroed.
>
> Please review. I don't have a strong opinion about which one should be
> applied.

ISTM we should just fix an uninitialized page first, using code from
VACUUM similar to

if (PageIsNew(page))
{
ereport(WARNING,
(errmsg("relation \"%s\" page %u is uninitialized --- fixing",
relname, blkno)));
PageInit(page, BufferGetPageSize(buf), 0);
}

then continue as before.

We definitely shouldn't do anything that leaves standby different to
primary.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 28 Jul 2010 07:29

On Wed, Jul 28, 2010 at 7:02 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
> On Tue, 2010-07-27 at 21:23 -0700, Jeff Davis wrote:
>
>> Both potential fixes attached and both appear to work.
>>
>> fix1 -- Only call PageSetLSN/TLI inside log_newpage() and
>> heap_xlog_newpage() if the page is not zeroed.
>>
>> fix2 -- Don't call log_newpage() at all if the page is not zeroed.
>>
>> Please review. I don't have a strong opinion about which one should be
>> applied.
>
> ISTM we should just fix an uninitialized page first, using code from
> VACUUM similar to
>
> �if (PageIsNew(page))
> �{
> � �ereport(WARNING,
> � � � �(errmsg("relation \"%s\" page %u is uninitialized --- fixing",
> � � � � � � � � � � � � � � � � � � � � � � � �relname, blkno)));
> � �PageInit(page, BufferGetPageSize(buf), 0);
> �}
>
> then continue as before.

As Jeff Davis pointed out upthread, you don't know that 0 is the
correct special size for the relation being copied. In the VACUUM
path, that code is only applied to heaps, but that's not necessarily
the case here.

> We definitely shouldn't do anything that leaves standby different to
> primary.

Obviously.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jeff Davis on 28 Jul 2010 12:12

On Wed, 2010-07-28 at 06:40 -0400, Robert Haas wrote:
> > fix1 -- Only call PageSetLSN/TLI inside log_newpage() and
> > heap_xlog_newpage() if the page is not zeroed.
> >
> > fix2 -- Don't call log_newpage() at all if the page is not zeroed.
> >
> > Please review. I don't have a strong opinion about which one should be
> > applied.
>
> Looks good. I still prefer fix2; it seems cleaner to me. It has
> another advantage, too, which is that copy_relation_data() is used
> ONLY by ALTER TABLE SET TABLESPACE. So if I stick to patching that
> function, that's the only thing I can possibly break, whereas
> log_newpage() is used in a bunch of other places. I don't think
> either fix is going to break anything at all, but considering that
> this is going to need back-patching, I'd rather be conservative.
>

Sounds good to me.

However, when Simon said "We definitely shouldn't do anything that
leaves standby different to primary." you said "obviously". Fix2 can
leave a difference between the two, because zeroed pages at the end of
the heap file on the primary will not be sent to the standby (the
standby will only create the zeroed pages if a higher block number is
sent; which won't be the case if the zeroed pages are at the end).

As we discussed before, that looks inconsequential, but I just want to
make sure that it's understood.

> Speaking of back-patching, the subject line describes this as an 8.3+
> problem, but it looks to me like this goes all the way back to 8.0.
> The code is a little different in 8.2 and prior, but ISTM it's
> vulnerable to the same issue. Don't we need a modified version of
> this patch for the 8.0 - 8.2 branches?

That sounds right. I just saw that the code in question was introduced
in 8.3.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 28 Jul 2010 12:36

Jeff Davis <pgsql(a)j-davis.com> writes:
> However, when Simon said "We definitely shouldn't do anything that
> leaves standby different to primary." you said "obviously". Fix2 can
> leave a difference between the two, because zeroed pages at the end of
> the heap file on the primary will not be sent to the standby (the
> standby will only create the zeroed pages if a higher block number is
> sent; which won't be the case if the zeroed pages are at the end).

> As we discussed before, that looks inconsequential, but I just want to
> make sure that it's understood.

I understand it, and I don't like it one bit. I haven't caught up on
this thread yet, but I think the only acceptable solution is one that
leaves the slave in the *same* state as the master. Not a state that
we hope will behave equivalently. I can think of too many corner cases
where it might not. (In fact, having a zeroed page in a relation is
already a corner case in itself, so the amount of testing you'd get for
such behaviors is epsilon squared. You don't want to take that bet.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: ALTER TABLE ... DISABLE TRIGGER vs. AccessExclusiveLock
Next: [HACKERS] Toward a column reorder solution