From: Robert Haas on
On Tue, Apr 13, 2010 at 11:49 AM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> Robert Haas wrote:
>> On Tue, Apr 6, 2010 at 10:36 AM, Heikki Linnakangas
>> <heikki.linnakangas(a)enterprisedb.com> wrote:
>>> Robert Haas wrote:
>>>>>>     * If standby_mode is enabled, and neither primary_conninfo nor restore_command are set, the standby would get stuck.
>>>>> It's not really stuck, it will replay any WAL files you drop into
>>>>> pg_xlog. I concur with Robert Haas though that it shouldn't print the
>>>>> message to the log every few seconds. It should print a message the
>>>>> first time it hits the end of WAL, but subsequent messages should be
>>>>> suppressed until some progress has been made.
>>>> Any idea how to implement this?
>>> I'll take a look. It shouldn't be too hard.
>>
>> The tricky part, I believe, is that there's more than one message that
>> can potentially be emitted, and you don't want ANY of them to repeat
>> every 2 s, so some thought needs to be given to where to hook in the
>> logic.
>
> We have the emode_for_corrupt_record() function that's used in all the
> errors that indicate a corrupt WAL record, that's a perfect place to
> hook this into. See attached patch.

The test for elog == LOG seems a bit fragile to me - why that
specifically? Maybe elog < PANIC? elog > DEBUG1? Both?

But it seems basically sensible to me.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on
On Wed, Apr 14, 2010 at 12:49 AM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> We have the emode_for_corrupt_record() function that's used in all the
> errors that indicate a corrupt WAL record, that's a perfect place to
> hook this into. See attached patch.

One problem of the patch is that even if the content of error message
is different from the past, it would be skipped when the location of
invalid record is the same of the past. For example, if there is a
partially-filled unbroken WAL file in the standby, the following
message would be written:

record with zero length at %X/%X

Then if you drop corrupted WAL file into pg_xlog, the following message
might have to be output, but would be skipped:

invalid magic number %04X in log file %u, segment %u, offset %u


But I think that we might be able to live with the issue since it's
a very corner case.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on
Fujii Masao wrote:
> One problem of the patch is that even if the content of error message
> is different from the past, it would be skipped when the location of
> invalid record is the same of the past. For example, if there is a
> partially-filled unbroken WAL file in the standby, the following
> message would be written:
>
> record with zero length at %X/%X
>
> Then if you drop corrupted WAL file into pg_xlog, the following message
> might have to be output, but would be skipped:
>
> invalid magic number %04X in log file %u, segment %u, offset %u
>
>
> But I think that we might be able to live with the issue since it's
> a very corner case.

Yeah, we can live with that. The user is not generally interested in
what exactly is wrong with the record. It just indicates that it has the
end of valid WAL in the standby.

Applied.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on
Robert Haas wrote:
> On Tue, Apr 13, 2010 at 11:49 AM, Heikki Linnakangas
> <heikki.linnakangas(a)enterprisedb.com> wrote:
>> We have the emode_for_corrupt_record() function that's used in all the
>> errors that indicate a corrupt WAL record, that's a perfect place to
>> hook this into. See attached patch.
>
> The test for elog == LOG seems a bit fragile to me - why that
> specifically? Maybe elog < PANIC? elog > DEBUG1? Both?

Suppressing anything >= ERROR wouldn't make sense, as ERRORs cause the
replay to abort. I didn't want to affect WARNINGs either, which indicate
that something is truly wrong. The only level left between DEBUG1, which
is what the message is downgraded to, and WARNING, is LOG.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers