Prev: pgsql: Make standby server continuously retry restoringthe next WAL
Next: pg_restore --single-transaction and --clean
From: Robert Haas on 30 Mar 2010 23:21 On Tue, Mar 30, 2010 at 12:26 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote: > On Wed, Mar 3, 2010 at 9:41 PM, Fujii Masao <masao.fujii(a)gmail.com> wrote: >> On Wed, Feb 24, 2010 at 2:18 PM, Fujii Masao <masao.fujii(a)gmail.com> wrote: >>> If standby_mode is enabled, and neither primary_conninfo nor restore_command >>> are set, the standby would get stuck. How about forbidding (i.e., causing a >>> FATAL message) this wrong setting? >> >> Here is the patch which forbids that wrong setting of recovery.conf. > > I think that this patch should be applied. Otherwise, if you wrongly > set neither primary_conninfo nor restore_command in recovery.conf, > the standby server would do nothing and get stuck because it doesn't > know where to retrieve the WAL files from. Banning the incorrect > setting makes sense to me. > > Does anyone commit the patch? Does anyone have a say? I just tested this and it seems to just sit there doing this over and over again: LOG: record with zero length at 0/3006B28 I'm not sure that we should forbid this configuration, but the current behavior doesn't seem right either. ISTM that, in the absence of a way to get any more WAL, it would be reasonable for the standby server to just start up and sit there in recovery mode but without actually advancing recovery, but the repeated log messages are pretty annoying. If we're connected in streaming mode and there is no activity on the primary, we don't emit logs of this type, so it doesn't seem like we should do that if there is no primary either. A related question is... do we ever reload recovery.conf? I tried adding the setting to recovery.conf and doing pg_ctl reload, and it says that it's "reloading configuration files", but doesn't pick up the new setting. :-( ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Fujii Masao on 31 Mar 2010 01:47 On Wed, Mar 31, 2010 at 12:21 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: > I just tested this and it seems to just sit there doing this over and > over again: > > LOG: record with zero length at 0/3006B28 > > I'm not sure that we should forbid this configuration, but the current > behavior doesn't seem right either. ISTM that, in the absence of a > way to get any more WAL, it would be reasonable for the standby server > to just start up and sit there in recovery mode but without actually > advancing recovery, but the repeated log messages are pretty annoying. I'm concerned about that the configuration might prevent the standby from accepting connection from a client because it cannot get the WAL for making the database consistent. So that configuration seems to be reasonable only when starting the standby from the already-consistent database or with enough WAL files in pg_xlog. But it seems to me that the standby often starts from the inconsistent database without enough WAL in pg_xlog. > A related question is... do we ever reload recovery.conf? I tried > adding the setting to recovery.conf and doing pg_ctl reload, and it > says that it's "reloading configuration files", but doesn't pick up > the new setting. :-( recovery.conf cannot be reloaded while the server is running. This restriction should be removed in the future release, I think. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 31 Mar 2010 09:44 On Wed, Mar 31, 2010 at 1:47 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote: > On Wed, Mar 31, 2010 at 12:21 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: >> I just tested this and it seems to just sit there doing this over and >> over again: >> >> LOG: record with zero length at 0/3006B28 >> >> I'm not sure that we should forbid this configuration, but the current >> behavior doesn't seem right either. ISTM that, in the absence of a >> way to get any more WAL, it would be reasonable for the standby server >> to just start up and sit there in recovery mode but without actually >> advancing recovery, but the repeated log messages are pretty annoying. > > I'm concerned about that the configuration might prevent the standby > from accepting connection from a client because it cannot get the WAL > for making the database consistent. So that configuration seems to be > reasonable only when starting the standby from the already-consistent > database or with enough WAL files in pg_xlog. But it seems to me that > the standby often starts from the inconsistent database without enough > WAL in pg_xlog. Agreed. I think if the server starts up in standby mode and it is an inconsistent state with no source of WAL, then the startup process should exit with a suitable error message, which AIUI will result in the whole server shutting down. However if there is no source of WAL but the server is in a consistent state, then I think we should allow it to start up as a read-only standby. Now, an interesting question is - if the server is in this state, and somebody manually drops more WAL into pg_xlog, what happens? And what happens in the similar case where primary_conninfo is set but we can't connect to the master at the moment, and someone drops a pile of WAL on us? >> A related question is... do we ever reload recovery.conf? I tried >> adding the setting to recovery.conf and doing pg_ctl reload, and it >> says that it's "reloading configuration files", but doesn't pick up >> the new setting. :-( > > recovery.conf cannot be reloaded while the server is running. This > restriction should be removed in the future release, I think. Yes. If we don't already have a TODO for that, we should definitely add one. I found myself annoyed by this several times last night. I kept having to restart the master, too, first to fix archive_mode and then to fix max_wal_senders. It's far too late to start tinkering with this stuff now but I am pretty confident there will be a huge sigh of collective relief out there if we can relax some of these restrictions for 9.1. Nobody likes having to shut down the server, even if it's just for a few seconds. ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 31 Mar 2010 16:54 Robert Haas wrote: > Agreed. I think if the server starts up in standby mode and it is an > inconsistent state with no source of WAL, then the startup process > should exit with a suitable error message, which AIUI will result in > the whole server shutting down. However if there is no source of WAL > but the server is in a consistent state, then I think we should allow > it to start up as a read-only standby. > > Now, an interesting question is - if the server is in this state, and > somebody manually drops more WAL into pg_xlog, what happens? And what > happens in the similar case where primary_conninfo is set but we can't > connect to the master at the moment, and someone drops a pile of WAL > on us? With the recent changes to the retry logic (http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php), they will be replayed. Even if neither primary_conninfo or restore_command is given, the server will still keep polling pg_xlog, and if you copy a WAL file to standby's pg_xlog directory, it will be replayed and recovery will make progress. I wouldn't recommend setting up a standby server like that, but it's not totally unreasonable. So the standby always has a potential source of WAL, pg_xlog. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 31 Mar 2010 17:04
On Wed, Mar 31, 2010 at 4:54 PM, Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> wrote: > Robert Haas wrote: >> Agreed. I think if the server starts up in standby mode and it is an >> inconsistent state with no source of WAL, then the startup process >> should exit with a suitable error message, which AIUI will result in >> the whole server shutting down. However if there is no source of WAL >> but the server is in a consistent state, then I think we should allow >> it to start up as a read-only standby. >> >> Now, an interesting question is - if the server is in this state, and >> somebody manually drops more WAL into pg_xlog, what happens? And what >> happens in the similar case where primary_conninfo is set but we can't >> connect to the master at the moment, and someone drops a pile of WAL >> on us? > > With the recent changes to the retry logic > (http://archives.postgresql.org/pgsql-committers/2010-03/msg00356.php), > they will be replayed. Even if neither primary_conninfo or > restore_command is given, the server will still keep polling pg_xlog, > and if you copy a WAL file to standby's pg_xlog directory, it will be > replayed and recovery will make progress. > > I wouldn't recommend setting up a standby server like that, but it's not > totally unreasonable. So the standby always has a potential source of > WAL, pg_xlog. OK. Is it reasonable to think that we can find a way to make it not print the duplicate messages over and over again? LOG: record with zero length at 0/3006B28 Maybe only print that if the location has advanced since the last such message? Should we make it shut down if it can't immediately read enough WAL to get to a consistent state, or just figure it's the user's job to fix it? ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |