From: Robert Haas on
On Tue, Mar 30, 2010 at 12:26 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
> On Wed, Mar 3, 2010 at 9:41 PM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
>> On Wed, Feb 24, 2010 at 2:18 PM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
>>> If standby_mode is enabled, and neither primary_conninfo nor restore_command
>>> are set, the standby would get stuck. How about forbidding (i.e., causing a
>>> FATAL message) this wrong setting?
>>
>> Here is the patch which forbids that wrong setting of recovery.conf.
>
> I think that this patch should be applied. Otherwise, if you wrongly
> set neither primary_conninfo nor restore_command in recovery.conf,
> the standby server would do nothing and get stuck because it doesn't
> know where to retrieve the WAL files from. Banning the incorrect
> setting makes sense to me.
>
> Does anyone commit the patch? Does anyone have a say?

I just tested this and it seems to just sit there doing this over and
over again:

LOG: record with zero length at 0/3006B28

I'm not sure that we should forbid this configuration, but the current
behavior doesn't seem right either. ISTM that, in the absence of a
way to get any more WAL, it would be reasonable for the standby server
to just start up and sit there in recovery mode but without actually
advancing recovery, but the repeated log messages are pretty annoying.
If we're connected in streaming mode and there is no activity on the
primary, we don't emit logs of this type, so it doesn't seem like we
should do that if there is no primary either.

A related question is... do we ever reload recovery.conf? I tried
adding the setting to recovery.conf and doing pg_ctl reload, and it
says that it's "reloading configuration files", but doesn't pick up
the new setting. :-(

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on
On Wed, Mar 31, 2010 at 12:21 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> I just tested this and it seems to just sit there doing this over and
> over again:
>
> LOG:  record with zero length at 0/3006B28
>
> I'm not sure that we should forbid this configuration, but the current
> behavior doesn't seem right either.  ISTM that, in the absence of a
> way to get any more WAL, it would be reasonable for the standby server
> to just start up and sit there in recovery mode but without actually
> advancing recovery, but the repeated log messages are pretty annoying.

I'm concerned about that the configuration might prevent the standby
from accepting connection from a client because it cannot get the WAL
for making the database consistent. So that configuration seems to be
reasonable only when starting the standby from the already-consistent
database or with enough WAL files in pg_xlog. But it seems to me that
the standby often starts from the inconsistent database without enough
WAL in pg_xlog.

> A related question is... do we ever reload recovery.conf?  I tried
> adding the setting to recovery.conf and doing pg_ctl reload, and it
> says that it's "reloading configuration files", but doesn't pick up
> the new setting.  :-(

recovery.conf cannot be reloaded while the server is running. This
restriction should be removed in the future release, I think.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Wed, Mar 31, 2010 at 1:47 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
> On Wed, Mar 31, 2010 at 12:21 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>> I just tested this and it seems to just sit there doing this over and
>> over again:
>>
>> LOG:  record with zero length at 0/3006B28
>>
>> I'm not sure that we should forbid this configuration, but the current
>> behavior doesn't seem right either.  ISTM that, in the absence of a
>> way to get any more WAL, it would be reasonable for the standby server
>> to just start up and sit there in recovery mode but without actually
>> advancing recovery, but the repeated log messages are pretty annoying.
>
> I'm concerned about that the configuration might prevent the standby
> from accepting connection from a client because it cannot get the WAL
> for making the database consistent. So that configuration seems to be
> reasonable only when starting the standby from the already-consistent
> database or with enough WAL files in pg_xlog. But it seems to me that
> the standby often starts from the inconsistent database without enough
> WAL in pg_xlog.

Agreed. I think if the server starts up in standby mode and it is an
inconsistent state with no source of WAL, then the startup process
should exit with a suitable error message, which AIUI will result in
the whole server shutting down. However if there is no source of WAL
but the server is in a consistent state, then I think we should allow
it to start up as a read-only standby.

Now, an interesting question is - if the server is in this state, and
somebody manually drops more WAL into pg_xlog, what happens? And what
happens in the similar case where primary_conninfo is set but we can't
connect to the master at the moment, and someone drops a pile of WAL
on us?

>> A related question is... do we ever reload recovery.conf?  I tried
>> adding the setting to recovery.conf and doing pg_ctl reload, and it
>> says that it's "reloading configuration files", but doesn't pick up
>> the new setting.  :-(
>
> recovery.conf cannot be reloaded while the server is running. This
> restriction should be removed in the future release, I think.

Yes. If we don't already have a TODO for that, we should definitely
add one. I found myself annoyed by this several times last night. I
kept having to restart the master, too, first to fix archive_mode and
then to fix max_wal_senders. It's far too late to start tinkering
with this stuff now but I am pretty confident there will be a huge
sigh of collective relief out there if we can relax some of these
restrictions for 9.1. Nobody likes having to shut down the server,
even if it's just for a few seconds.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on
Robert Haas wrote:
> Agreed. I think if the server starts up in standby mode and it is an
> inconsistent state with no source of WAL, then the startup process
> should exit with a suitable error message, which AIUI will result in
> the whole server shutting down. However if there is no source of WAL
> but the server is in a consistent state, then I think we should allow
> it to start up as a read-only standby.
>
> Now, an interesting question is - if the server is in this state, and
> somebody manually drops more WAL into pg_xlog, what happens? And what
> happens in the similar case where primary_conninfo is set but we can't
> connect to the master at the moment, and someone drops a pile of WAL
> on us?

With the recent changes to the retry logic
(http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php),
they will be replayed. Even if neither primary_conninfo or
restore_command is given, the server will still keep polling pg_xlog,
and if you copy a WAL file to standby's pg_xlog directory, it will be
replayed and recovery will make progress.

I wouldn't recommend setting up a standby server like that, but it's not
totally unreasonable. So the standby always has a potential source of
WAL, pg_xlog.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Wed, Mar 31, 2010 at 4:54 PM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> Robert Haas wrote:
>> Agreed.  I think if the server starts up in standby mode and it is an
>> inconsistent state with no source of WAL, then the startup process
>> should exit with a suitable error message, which AIUI will result in
>> the whole server shutting down.  However if there is no source of WAL
>> but the server is in a consistent state, then I think we should allow
>> it to start up as a read-only standby.
>>
>> Now, an interesting question is - if the server is in this state, and
>> somebody manually drops more WAL into pg_xlog, what happens? And what
>> happens in the similar case where primary_conninfo is set but we can't
>> connect to the master at the moment, and someone drops a pile of WAL
>> on us?
>
> With the recent changes to the retry logic
> (http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php),
> they will be replayed. Even if neither primary_conninfo or
> restore_command is given, the server will still keep polling pg_xlog,
> and if you copy a WAL file to standby's pg_xlog directory, it will be
> replayed and recovery will make progress.
>
> I wouldn't recommend setting up a standby server like that, but it's not
> totally unreasonable. So the standby always has a potential source of
> WAL, pg_xlog.

OK.

Is it reasonable to think that we can find a way to make it not print
the duplicate messages over and over again?

LOG: record with zero length at 0/3006B28

Maybe only print that if the location has advanced since the last such message?

Should we make it shut down if it can't immediately read enough WAL to
get to a consistent state, or just figure it's the user's job to fix
it?

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers