page corruption on 8.3+ that makes it to standby [PgSql]

Prev: ALTER TABLE ... DISABLE TRIGGER vs. AccessExclusiveLock
Next: [HACKERS] Toward a column reorder solution

From: Robert Haas on 27 Jul 2010 15:50

On Tue, Jul 27, 2010 at 2:06 PM, Jeff Davis <pgsql(a)j-davis.com> wrote:
> I reported a problem here:
>
> http://archives.postgresql.org/pgsql-bugs/2010-07/msg00173.php
>
> Perhaps I used a poor subject line, but I believe it's a serious issue.
> That reproducible sequence seems like an obvious bug to me on 8.3+, and
> what's worse, the corruption propagates to the standby as I found out
> today (through a test, fortunately).

I think that the problem is not so much your choice of subject line as
your misfortune to discover this bug when Tom and Heikki were both on
vacation.

> The only mitigating factor is that it doesn't actually lose data, and
> you can fix it (I believe) with zero_damaged_pages (or careful use of
> dd).
>
> There are two fixes that I can see:
>
> 1. Have log_newpage() and heap_xlog_newpage() only call PageSetLSN() and
> PageSetTLI() if the page is not new. This seems slightly awkward because
> most WAL replay stuff doesn't have to worry about zero pages, but in
> this case I think it does.
>
> 2. Have copy_relation_data() initialize new pages. I don't like this
> because (a) it's not really the job of SET TABLESPACE to clean up zero
> pages; and (b) it could be an index with different special size, etc.,
> and it doesn't seem like a good place to figure that out.

It appears to me that all of the callers of log_newpage() other than
copy_relation_data() do so with pages that they've just constructed,
and which therefore can't be new. So maybe we could just modify
copy_relation_data to check PageIsNew(buf), or something like that,
and only call log_newpage() if that returns true.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jeff Davis on 27 Jul 2010 17:08

On Tue, 2010-07-27 at 15:50 -0400, Robert Haas wrote:
> > 1. Have log_newpage() and heap_xlog_newpage() only call PageSetLSN() and
> > PageSetTLI() if the page is not new. This seems slightly awkward because
> > most WAL replay stuff doesn't have to worry about zero pages, but in
> > this case I think it does.
> >
> > 2. Have copy_relation_data() initialize new pages. I don't like this
> > because (a) it's not really the job of SET TABLESPACE to clean up zero
> > pages; and (b) it could be an index with different special size, etc.,
> > and it doesn't seem like a good place to figure that out.
>
> It appears to me that all of the callers of log_newpage() other than
> copy_relation_data() do so with pages that they've just constructed,
> and which therefore can't be new. So maybe we could just modify
> copy_relation_data to check PageIsNew(buf), or something like that,
> and only call log_newpage() if that returns true.

My first concern with that idea was that it may create an inconsistency
between the primary and the standby. The primary could have a bunch of
zero pages that never make it to the standby.

However, it looks like all WAL recovery stuff passes true for "init"
when calling XLogReadBuffer(), so I think it's safe.

I'll test it and submit a patch.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 27 Jul 2010 17:18

On Tue, Jul 27, 2010 at 5:08 PM, Jeff Davis <pgsql(a)j-davis.com> wrote:
> On Tue, 2010-07-27 at 15:50 -0400, Robert Haas wrote:
>> > 1. Have log_newpage() and heap_xlog_newpage() only call PageSetLSN() and
>> > PageSetTLI() if the page is not new. This seems slightly awkward because
>> > most WAL replay stuff doesn't have to worry about zero pages, but in
>> > this case I think it does.
>> >
>> > 2. Have copy_relation_data() initialize new pages. I don't like this
>> > because (a) it's not really the job of SET TABLESPACE to clean up zero
>> > pages; and (b) it could be an index with different special size, etc.,
>> > and it doesn't seem like a good place to figure that out.
>>
>> It appears to me that all of the callers of log_newpage() other than
>> copy_relation_data() do so with pages that they've just constructed,
>> and which therefore can't be new. �So maybe we could just modify
>> copy_relation_data to check PageIsNew(buf), or something like that,
>> and only call log_newpage() if that returns true.
>
> My first concern with that idea was that it may create an inconsistency
> between the primary and the standby. The primary could have a bunch of
> zero pages that never make it to the standby.

Maybe I'm slow on the uptake here, but don't the pages start out
all-zeroes on the standby just as they do on the primary? The only way
it seems like this would be a problem is if a page that previously
contained data on the primary was subsequently zeroed without writing
a WAL record - or am I confused?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jeff Davis on 27 Jul 2010 18:23

On Tue, 2010-07-27 at 17:18 -0400, Robert Haas wrote:
> > My first concern with that idea was that it may create an inconsistency
> > between the primary and the standby. The primary could have a bunch of
> > zero pages that never make it to the standby.
>
> Maybe I'm slow on the uptake here, but don't the pages start out
> all-zeroes on the standby just as they do on the primary? The only way
> it seems like this would be a problem is if a page that previously
> contained data on the primary was subsequently zeroed without writing
> a WAL record - or am I confused?

The case I was concerned about is when you have a table on the primary
with a bunch of zero pages at the end. Then you SET TABLESPACE, and none
of the copied pages (or even the fact that they exist) would be sent to
the standby, but they would exist on the primary. And later pages may
have data, so the standby may see page N but not N-1.

Generally, most of the code is not expecting to read or write past the
end of the file, unless it's doing an extension.

However, I think everything is fine during recovery, because it looks
like it's designed to create zero pages as needed. So your idea seems
safe to me, although I do still have some doubts because of my lack of
knowledge in this area; particularly hot standby conflict
detection/resolution.

My idea was different: still log the zero page, just don't set LSN or
TLI for a zero page in log_newpage() or heap_xlog_newpage(). This isn't
as clean as your idea, but I'm a little more confident that it is
correct.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jeff Davis on 28 Jul 2010 00:23

On Tue, 2010-07-27 at 15:23 -0700, Jeff Davis wrote:
> On Tue, 2010-07-27 at 17:18 -0400, Robert Haas wrote:
> > > My first concern with that idea was that it may create an inconsistency
> > > between the primary and the standby. The primary could have a bunch of
> > > zero pages that never make it to the standby.
> >
> > Maybe I'm slow on the uptake here, but don't the pages start out
> > all-zeroes on the standby just as they do on the primary? The only way
> > it seems like this would be a problem is if a page that previously
> > contained data on the primary was subsequently zeroed without writing
> > a WAL record - or am I confused?
>
> The case I was concerned about is when you have a table on the primary
> with a bunch of zero pages at the end. Then you SET TABLESPACE, and none
> of the copied pages (or even the fact that they exist) would be sent to
> the standby, but they would exist on the primary. And later pages may
> have data, so the standby may see page N but not N-1.
>
> Generally, most of the code is not expecting to read or write past the
> end of the file, unless it's doing an extension.
>
> However, I think everything is fine during recovery, because it looks
> like it's designed to create zero pages as needed. So your idea seems
> safe to me, although I do still have some doubts because of my lack of
> knowledge in this area; particularly hot standby conflict
> detection/resolution.
>
> My idea was different: still log the zero page, just don't set LSN or
> TLI for a zero page in log_newpage() or heap_xlog_newpage(). This isn't
> as clean as your idea, but I'm a little more confident that it is
> correct.
>

Both potential fixes attached and both appear to work.

fix1 -- Only call PageSetLSN/TLI inside log_newpage() and
heap_xlog_newpage() if the page is not zeroed.

fix2 -- Don't call log_newpage() at all if the page is not zeroed.

Please review. I don't have a strong opinion about which one should be
applied.

Regards,
Jeff Davis

| Next | Last
Pages: 1 2 3 4 5 6
Prev: ALTER TABLE ... DISABLE TRIGGER vs. AccessExclusiveLock
Next: [HACKERS] Toward a column reorder solution