From: Heikki Linnakangas on
Robert Haas wrote:
> Is it reasonable to think that we can find a way to make it not print
> the duplicate messages over and over again?
>
> LOG: record with zero length at 0/3006B28
>
> Maybe only print that if the location has advanced since the last such message?

Yeah, seems reasonable.

> Should we make it shut down if it can't immediately read enough WAL to
> get to a consistent state, or just figure it's the user's job to fix
> it?

I'd say no. In testing, I have done this many times:

pg_start_backup()
copy data directory to server
create recovery.conf
Start standby server.
pg_stop_backup()

The standby doesn't reach consistency before it sees the end-of-backup
record written by pg_stop_backup(), but it does replay up to the last
WAL segment, and connect to the master.

Not sure if that's useful in real life, but there could be situations
where restore_command isn't totally reliable, for example, and it's good
to keep trying.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Wed, Mar 31, 2010 at 5:23 PM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> Robert Haas wrote:
>> Is it reasonable to think that we can find a way to make it not print
>> the duplicate messages over and over again?
>>
>> LOG:  record with zero length at 0/3006B28
>>
>> Maybe only print that if the location has advanced since the last such message?
>
> Yeah, seems reasonable.
>
>> Should we make it shut down if it can't immediately read enough WAL to
>> get to a consistent state, or just figure it's the user's job to fix
>> it?
>
> I'd say no. In testing, I have done this many times:
>
> pg_start_backup()
> copy data directory to server
> create recovery.conf
> Start standby server.
> pg_stop_backup()
>
> The standby doesn't reach consistency before it sees the end-of-backup
> record written by pg_stop_backup(), but it does replay up to the last
> WAL segment, and connect to the master.
>
> Not sure if that's useful in real life, but there could be situations
> where restore_command isn't totally reliable, for example, and it's good
> to keep trying.

I was only thinking of doing it in the case where there's no
primary_conninfo or restore_command.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on
On Thu, Apr 1, 2010 at 6:04 AM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>> I wouldn't recommend setting up a standby server like that, but it's not
>> totally unreasonable. So the standby always has a potential source of
>> WAL, pg_xlog.
>
> OK.

OK, too. I turn down the patch.

> Is it reasonable to think that we can find a way to make it not print
> the duplicate messages over and over again?
>
> LOG:  record with zero length at 0/3006B28
>
> Maybe only print that if the location has advanced since the last such message?

Agreed. But what log message is repeated depends on the situation.
So message without any location might be output. BTW, In my testing,
the following message was repeated.

LOG: invalid magic number 0000 in log file 0, segment 14, offset 9617408

> Should we make it shut down if it can't immediately read enough WAL to
> get to a consistent state, or just figure it's the user's job to fix
> it?

I think that it's difficult for the user to fix it. So I agree to shut
down the server in that case, i.e., throw a FATAL when an invalid WAL
record is found and recovery hasn't reached the safe starting point
even if neither primary_conninfo nor restore_command is given.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Wed, Mar 31, 2010 at 9:01 PM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
> Agreed. But what log message is repeated depends on the situation.
> So message without any location might be output. BTW, In my testing,
> the following message was repeated.
>
>    LOG:  invalid magic number 0000 in log file 0, segment 14, offset 9617408

Yeah, that's a pain in the neck. We need to think about a way to
avoid any of these messages repeating. Not sure how, off the top of
my head.

>> Should we make it shut down if it can't immediately read enough WAL to
>> get to a consistent state, or just figure it's the user's job to fix
>> it?
>
> I think that it's difficult for the user to fix it. So I agree to shut
> down the server in that case, i.e., throw a FATAL when an invalid WAL
> record is found and recovery hasn't reached the safe starting point
> even if neither primary_conninfo nor restore_command is given.

I think that's reasonable. It's not like this should cause any
problem for the user: they can add the missing WAL while the server is
down just as well as they could if it were up, and Hot Standby isn't
going to come up anyway. But I could possibly be persuaded to change
my mind on this one, if someone feels strongly otherwise.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Wed, 2010-03-31 at 23:54 +0300, Heikki Linnakangas wrote:
> Robert Haas wrote:
> > Agreed. I think if the server starts up in standby mode and it is an
> > inconsistent state with no source of WAL, then the startup process
> > should exit with a suitable error message, which AIUI will result in
> > the whole server shutting down. However if there is no source of WAL
> > but the server is in a consistent state, then I think we should allow
> > it to start up as a read-only standby.
> >
> > Now, an interesting question is - if the server is in this state, and
> > somebody manually drops more WAL into pg_xlog, what happens? And what
> > happens in the similar case where primary_conninfo is set but we can't
> > connect to the master at the moment, and someone drops a pile of WAL
> > on us?
>
> With the recent changes to the retry logic
> (http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php),
> they will be replayed. Even if neither primary_conninfo or
> restore_command is given, the server will still keep polling pg_xlog,
> and if you copy a WAL file to standby's pg_xlog directory, it will be
> replayed and recovery will make progress.
>
> I wouldn't recommend setting up a standby server like that, but it's not
> totally unreasonable. So the standby always has a potential source of
> WAL, pg_xlog.

I have inadvertently made it impossible to specify
standby_mode && (!primary_conninfo && !restore_command)

I did that because Robert had separately to this thread reported a hang,
caused by this specification. I have verified this.

pg_xlog is a *potential* source of WAL, but if the files requested are
not present then the server just sits and waits with *no* messages. That
is unacceptable, IMHO.

What should we do now?

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers