pgsql: Make standby server continuouslyretry restoring the next WAL [PgSql]

Prev: Performance Improvement for Unique Indexes
Next: pgsql: Make standby servercontinuously retry restoring the next WAL

From: Heikki Linnakangas on 25 Mar 2010 04:11

Tom Lane wrote:
> Fujii Masao <masao.fujii(a)gmail.com> writes:
>> OK. How about making the startup process emit WARNING, stop WAL replay and
>> wait for the presence of trigger file, when an invalid record is found?
>> Which keeps the server up for readonly queries. And if the trigger file is
>> found, I think that the startup process should emit a FATAL, i.e., the
>> server should exit immediately, to prevent the server from becoming the
>> primary in a half-finished state. Also to allow such a halfway failover,
>> we should provide fast failover mode as pg_standby does?
>
> I find it extremely scary to read this sort of blue-sky design
> discussion going on now, two months after we were supposedly
> feature-frozen for 9.0. We need to be looking for the *rock bottom
> minimum* amount of work to do to get 9.0 out the door in a usable
> state; not what would be nice to have later on.

Agreed, this is getting complicated. I'm already worried about the
amount of changes needed to make it work, I don't want to add any new
modes. PANIC seems like the appropriate solution for now.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 25 Mar 2010 06:26

Simon Riggs wrote:
> On Thu, 2010-03-25 at 10:11 +0200, Heikki Linnakangas wrote:
>
>> PANIC seems like the appropriate solution for now.
>
> It definitely is not. Think some more.

Well, what happens now in previous versions with pg_standby et al is
that the standby starts up. That doesn't seem appropriate either.

Hmm, it would be trivial to just stay in the standby mode at a corrupt
file, continuously retrying to restore it and continue replay. If it's
genuinely corrupt, it will never succeed and the standby gets stuck at
that point. Maybe that's better; it's close to what Fujii suggested
except that you don't need a new mode for it.

I'm worried that the administrator won't notice the error promptly
because at a quick glance the server is up and running, while it's
actually stuck at the error and falling indefinitely behind the master.
Maybe if we make it a WARNING, that's enough to alleviate that. It's
true that if the standby is actively being used for read-only queries,
shutting it down to just get the administrators attention isn't good either.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 25 Mar 2010 06:47

Heikki Linnakangas wrote:
> Simon Riggs wrote:
>> On Thu, 2010-03-25 at 10:11 +0200, Heikki Linnakangas wrote:
>>
>>> PANIC seems like the appropriate solution for now.
>> It definitely is not. Think some more.
>
> Well, what happens now in previous versions with pg_standby et al is
> that the standby starts up. That doesn't seem appropriate either.
>
> Hmm, it would be trivial to just stay in the standby mode at a corrupt
> file, continuously retrying to restore it and continue replay. If it's
> genuinely corrupt, it will never succeed and the standby gets stuck at
> that point. Maybe that's better; it's close to what Fujii suggested
> except that you don't need a new mode for it.

On second thought, that's easy for the built-in standby mode, but not
for archive recovery where we're not currently retrying anything. In
archive recovery, we could throw a WARNING and start up, which would be
like the current behavior in older versions except you get a WARNING
instead of LOG, or we could PANIC. I'm leaning towards PANIC, which
makes most sense for genuine point-in-time or archive recovery (ie. not
a standby server), but I can see the rationale for WARNING too, for
pg_standby and for the sake of preserving old behavior.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

|
Pages: 1
Prev: Performance Improvement for Unique Indexes
Next: pgsql: Make standby servercontinuously retry restoring the next WAL