From: Fujii Masao on
On Fri, Jan 15, 2010 at 1:06 AM, Dimitri Fontaine
<dfontaine(a)hi-media.com> wrote:
> Did I mention my viewpoint on that already?
>  http://archives.postgresql.org/pgsql-hackers/2009-07/msg00943.php

> 0. base: slave asks the master for a base-backup, at the end of this it
> reaches the base-lsn

What if the WAL file including the archive recovery starting location has
been removed from the primary's pg_xlog before the end of online-backup
(i.e., the procedure 0)? Where should the standby get such a WAL file from?
How?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on
Fujii Masao <masao.fujii(a)gmail.com> writes:
> On Fri, Jan 15, 2010 at 1:06 AM, Dimitri Fontaine <dfontaine(a)hi-media.com> wrote:
>> 0. base: slave asks the master for a base-backup, at the end of this it
>> reaches the base-lsn
>
> What if the WAL file including the archive recovery starting location has
> been removed from the primary's pg_xlog before the end of online-backup
> (i.e., the procedure 0)? Where should the standby get such a WAL file from?
> How?

I guess it would be perfectly sensible for 8.5, given the timeframe, to
not implement this as part of SR, but tell our users they need to make a
base backup themselves.

If after that the first WAL we need from the master ain't available, 8.5
SR should maybe only issue an ERROR with a HINT explaining how to ensure
not running in the problem when trying again.

But how we handle failures when transitioning from one state to the
other should be a lot easier to discuss and decide as soon as we have
the possible states and the transitions we want to allow and support. I
think.

My guess is that those states and transitions are in the code, but not
explicit, so that each time we talk about how to handle the error cases
we have to be extra verbose and we risk not talking about exactly the
same thing. Naming the states should make those arrangements easier, I
should think. Not sure if it would help follow the time constraint now
though.

Regards,
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Thu, Jan 14, 2010 at 10:23 AM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> I wasn't really asking if it's possible to fix, I meant "Let's think
> about *how* to fix that".

Well... maybe if it doesn't require too MUCH thought.

I'm thinking that HS+SR are going to be a bit like the Windows port -
they're going to require a few releases before they really work as
well as we'd like them too.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Robert Haas <robertmhaas(a)gmail.com> writes:
> I'm thinking that HS+SR are going to be a bit like the Windows port -
> they're going to require a few releases before they really work as
> well as we'd like them too.

I've assumed that from the get-go ;-). It's one of the reasons that
we ought to label this release 9.0 if those features get in. Such a
number would help clue folks that there might be some less than
entirely stable things about it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on
Fujii Masao wrote:
> On Fri, Jan 15, 2010 at 12:23 AM, Heikki Linnakangas
> <heikki.linnakangas(a)enterprisedb.com> wrote:
>> If we don't fix that within the server, we will need to document that
>> caveat and every installation will need to work around that one way or
>> another. Maybe with some monitoring software and an automatic restart. Ugh.
>>
>> I wasn't really asking if it's possible to fix, I meant "Let's think
>> about *how* to fix that".
>
> OK. How about the following (though it's a rough design)?
>
> (1) If walsender cannot read the WAL file because of ENOENT, it sends the
> special message indicating that error to walreceiver. This message is
> shipped on the COPY protocol.
>
> (2-a) If the message arrives, walreceiver exits by using proc_exit().
> (3-a) If the startup process detects the exit of walreceiver in
> WaitNextXLogAvailable(),
> it switches back to a normal archive recovery mode, closes
> the currently opened
> WAL file, resets some variables (readId, readSeg, etc), and
> calls FetchRecord()
> again. Then it tries to restore the WAL file from the
> archive if the restore_command
> is supplied, and switches to a streaming recovery mode again
> if invalid WAL is
> found.
>
> Or
>
> (2-b) If the message arrives, walreceiver executes restore_command,
> and then sets
> the receivedUpto to the end location of the restored WAL
> file. The restored file is
> expected to be filled because it doesn't exist in the
> primary's pg_xlog. So that
> update of the receivedUpto is OK.
> (3-b) After one WAL file is restored, walreceiver tries to connect to
> the primary, and
> starts replication again. If the ENOENT error occurs again,
> we go back to the (1).
>
> I like the latter approach since it's simpler. Thought?

Hmm. Executing restore_command in walreceiver process doesn't feel right
somehow. I'm thinking of:

Let's introduce a new boolean variable in shared memory that the
walreceiver can set to tell startup process if it's connected or
streaming, or disconnected. When startup process sees that walreceiver
is connected, it waits for receivedUpto to advance. Otherwise, it polls
the archive using restore_command.

To actually implement that requires some refactoring of the
ReadRecord/FetchRecord logic in xlog.c. However, it always felt a bit
hacky to me anyway, so that's not necessary a bad thing.

Now, one problem with this is that under the right conditions,
walreceiver might just succeed to reconnect, while the startup process
starts to restore the file from archive. That's OK, the streamed file
will be simply ignored, and the file restored from archive uses a
temporary filename that won't clash with the streamed file, but it feels
a bit strange to have the same file copied to the server via both
mechanisms.

See the "replication-xlogrefactor" branch in my git repository for a
prototype of that. We could also combine that with your 1st design, and
add the special message to indicate "WAL already deleted", and change
the walreceiver restart logic as you suggested. Some restructuring of
Read/FetchRecord is probably required for that anyway.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers