Prev: [HACKERS] Streaming replication, retrying from archive
Next: [HACKERS] archive_timeout behavior for no activity
From: Fujii Masao on 14 Jan 2010 21:38 On Fri, Jan 15, 2010 at 7:19 AM, Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> wrote: > Let's introduce a new boolean variable in shared memory that the > walreceiver can set to tell startup process if it's connected or > streaming, or disconnected. When startup process sees that walreceiver > is connected, it waits for receivedUpto to advance. Otherwise, it polls > the archive using restore_command. Seems OK. > See the "replication-xlogrefactor" branch in my git repository for a > prototype of that. We could also combine that with your 1st design, and > add the special message to indicate "WAL already deleted", and change > the walreceiver restart logic as you suggested. Some restructuring of > Read/FetchRecord is probably required for that anyway. Though I haven't read your branch much yet, there seems to be a corner case which a partially-filled WAL file might be restored wrongly, which would cause a PANIC error. So the primary should tell the last WAL file which has been filled completely. And when that file has been restored in the standby, the startup process should stop restoring any more files, and try to wait for streaming again. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 15 Jan 2010 13:11 Dimitri Fontaine wrote: > But how we handle failures when transitioning from one state to the > other should be a lot easier to discuss and decide as soon as we have > the possible states and the transitions we want to allow and support. I > think. > > My guess is that those states and transitions are in the code, but not > explicit, so that each time we talk about how to handle the error cases > we have to be extra verbose and we risk not talking about exactly the > same thing. Naming the states should make those arrangements easier, I > should think. Not sure if it would help follow the time constraint now > though. I agree, a state machine is a useful way of thinking about this. I recall that mail of yours from last summer :-). The states we have at the moment in standby are: 1. Archive recovery. Standby fetches WAL files from archive using restore_command. When a file is not found in archive, we switch to state 2 2. Streaming replication. Standby connects (and reconnects if the connection is lost for any reason) to the primary, starts streaming, and applies WAL as it arrives. We stay in this state until trigger file is found or server is shut down. The states with my suggested ReadRecord/FetchRecord refactoring, the code I have in the replication-xlogrefactor branch in my git repo, are: 1. Initial archive recovery. Standby fetches WAL files from archive using restore_command. When a file is not found in archive, we start walreceiver and switch to state 2 2. Retrying to restore from archive. When the connection to primary is established and replication is started, we switch to state 3 3. Streaming replication. Connection to primary is established, and WAL is applied as it arrives. When the connection is dropped, we go back to state 2 Although the the state transitions between 2 and 3 are a bit fuzzy in that version; walreceiver runs concurrently, trying to reconnect, while startup process retries restoring from archive. Fujii-san's suggestion to have walreceiver stop while startup process retries restoring from archive (or have walreceiver run restore_command in approach #2) would make that clearer. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 15 Jan 2010 13:56 On Fri, 2010-01-15 at 20:11 +0200, Heikki Linnakangas wrote: > The states we have at the moment in standby are: > > 1. Archive recovery. Standby fetches WAL files from archive using > restore_command. When a file is not found in archive, we switch to state 2 > > 2. Streaming replication. Standby connects (and reconnects if the > connection is lost for any reason) to the primary, starts streaming, and > applies WAL as it arrives. We stay in this state until trigger file is > found or server is shut down. > The states with my suggested ReadRecord/FetchRecord refactoring, the > code I have in the replication-xlogrefactor branch in my git repo, are: > > 1. Initial archive recovery. Standby fetches WAL files from archive > using restore_command. When a file is not found in archive, we start > walreceiver and switch to state 2 > > 2. Retrying to restore from archive. When the connection to primary is > established and replication is started, we switch to state 3 > > 3. Streaming replication. Connection to primary is established, and WAL > is applied as it arrives. When the connection is dropped, we go back to > state 2 > > Although the the state transitions between 2 and 3 are a bit fuzzy in > that version; walreceiver runs concurrently, trying to reconnect, while > startup process retries restoring from archive. Fujii-san's suggestion > to have walreceiver stop while startup process retries restoring from > archive (or have walreceiver run restore_command in approach #2) would > make that clearer. The one-way state transitions between 1->2 in both cases seem to make this a little more complex, rather than more simple. If the connection did drop then WAL will be in the archive, so the path for data is archive->primary->standby. There already needs to be a network path between archive and standby, so why not drop back from state 3 -> 1 rather than from 3 -> 2? That way we could have just 2 states on each side, rather than 3. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Dimitri Fontaine on 16 Jan 2010 05:36 Thanks for stating it this way, it really helps figuring out what is it we're talking about! Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes: > The states with my suggested ReadRecord/FetchRecord refactoring, the > code I have in the replication-xlogrefactor branch in my git repo, > are: They look like you're trying to solve a specific issue that is a consequence of another one, without fixing the cause. I hope I'm wrong, once more :) > 1. Initial archive recovery. Standby fetches WAL files from archive > using restore_command. When a file is not found in archive, we start > walreceiver and switch to state 2 > > 2. Retrying to restore from archive. When the connection to primary is > established and replication is started, we switch to state 3 When do the master know about this new slave being there? I'd say not until 3 is ok, and then, the actual details between 1 and 2 look strange, partly because it's more about processes than states. I'd propose to have 1 and 2 started in parallel from the beginning, and as Simon proposes, being able to get back to 1. at any time: 0. start from a base backup, determine the first WAL / LSN we need to start streaming, call it SR_LSN. That means asking the master its current xlog location. The LSN we're at now, after replaying the base backup and maybe the initial recovery from local WAL files, let's call it BASE_LSN. 1. Get the missing WAL to get from BASE_LSN to SR_LSN from the archive, with restore_command, apply them as we receive them, and start 2. possibly in parallel 2. Streaming replication: we connect to the primary and walreceiver gets the WALs from the connection. It either stores them if current standby's position < SR_LSN or apply them directly if we were already streaming. Local storage would be either standby's archiving or a specific temporary location. I guess it's more or less what you want to do with retrying from the master's archives, but I'm not sure your line of though makes it simpler. But that's more a process view, not a state view. As 1 and 2 run in parallel, we're missing some state names. Let's name the states now that we have the processes. base: start from a base backup, which we don't know how we got it catch-up: getting the WALs [from archive] to get from base to being able to apply the streaming wanna-sync: receiving primary's wal while not being able to replay them do-sync: applying the wals we got in wanna-sync state sync: replaying what's being sent as it arrives So the current problem is what happens when we're not able to start streaming from the primary, yet, or again. And your question is how will it get simpler with all those details. What I propose is to always have a walreceiver running and getting WALs from the master. Depending on current state it's applying them (sync) or keeping them for later (wanna-sync). We need some more code for it to apply WALs it's been keeping for later (do-sync), that depends on how we keep the WALs. Your problem is getting out of catch-up up to sync, and which process is doing what in between. I hope to make it clear to think about with my proposal, and would go as far as to say that the startup process does only care about getting the WALs from BASE_LSN to SR_LSN, that's called catch-up. Having another process to handle wanna-sync is neat, but can be sequential too. When you lose the connection, you get out of sync back to another state depending on missing wals, so to know that you need to contact the primary again. The master only considers any standby's in sync if its walsender process is up-to-date or lagging only the last emitted WAL. If lagging more, that means the standby's is catching up, or replaying more than the current WAL, so in wanna-sync or do-sync state. Not in sync. The details about when a slave is in sync will get more important as soon as we have synchronous streaming. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 20 Jan 2010 14:26 Dimitri Fontaine wrote: > Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes: >> 1. Initial archive recovery. Standby fetches WAL files from archive >> using restore_command. When a file is not found in archive, we start >> walreceiver and switch to state 2 >> >> 2. Retrying to restore from archive. When the connection to primary is >> established and replication is started, we switch to state 3 > > When do the master know about this new slave being there? I'd say not > until 3 is ok, and then, the actual details between 1 and 2 look > strange, partly because it's more about processes than states. Right. The master doesn't need to know about the slave. > I'd propose to have 1 and 2 started in parallel from the beginning, and > as Simon proposes, being able to get back to 1. at any time: > > 0. start from a base backup, determine the first WAL / LSN we need to > start streaming, call it SR_LSN. That means asking the master its > current xlog location. What if the master can't be contacted? > The LSN we're at now, after replaying the base > backup and maybe the initial recovery from local WAL files, let's > call it BASE_LSN. > > 1. Get the missing WAL to get from BASE_LSN to SR_LSN from the archive, > with restore_command, apply them as we receive them, and start > 2. possibly in parallel > > 2. Streaming replication: we connect to the primary and walreceiver gets > the WALs from the connection. It either stores them if current > standby's position < SR_LSN or apply them directly if we were already > streaming. > > Local storage would be either standby's archiving or a specific > temporary location. I guess it's more or less what you want to do > with retrying from the master's archives, but I'm not sure your line > of though makes it simpler. Seems complicated... > <snip> > The details about when a slave is in sync will get more important as > soon as we have synchronous streaming. Yeah, a lot of that logic and states is completely unnecessary until we have a synchronous mode. Even then, it seems complex. Here's what I've been hacking: First of all, walreceiver no longer tries to retry the connection on error, and postmaster no longer tries to relaunch it if it dies. So when Walreceiver is launched, it tries to connect once, and if successful, streams until an error occurs or it's killed. When startup process needs more WAL to continue replay, the logic is in pseudocode: while (<need more wal>) { if(<walreceiver is alive>) { wait for WAL to arrive, or for walreceiver to die. } else { Run restore_command If (restore_command succeeded) break; else { Sleep 5 seconds Start walreceiver } } } So there's just two states: 1. Recovering from archive 2. Streaming We start from 1, and switch state at error. This gives nice behavior from a user point of view. Standby tries to make progress using either the archive or streaming, whichever becomes available first. Attached is a WIP patch implementing that, also available in the 'replication-xlogrefactor' branch in my git repository. It includes the Read/FetchRecord refactoring I mentioned earlier; that's a pre-requisite for this. The code implementing the above retry logic in XLogReadPage(), in xlog.c. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: [HACKERS] Streaming replication, retrying from archive Next: [HACKERS] archive_timeout behavior for no activity |