From: Heikki Linnakangas on 15 Jan 2010 13:50 Simon Riggs wrote: > We need to calculate a more accurate time since WAL arrived to make > max_standby_delay sensible in all cases. Difficult to know exactly when > to record new timestamps for received WAL. So, proposal is... > > if (Base time is earlier than WAL record time) > standby_delay = WAL record time - Base time > else > standby_delay = now() - Base time > > When standby_mode = off we record new base time when a new WAL file > arrives. > > When standby_mode = on we record new base time each time we do > XLogWalRcvFlush(). We also record a new base time on first entry to the > main for loop in XLogRecv(), i.e. each time we start writing a new burst > of streamed WAL data. > > So in either case, when we are waiting for new input we reset the timer > as soon as new WAL is received. The resolution/accuracy of standby_delay > will be no more than the time taken to replay a single file. This > shouldn't matter, since sane settings of max_standby_delay are either 0 > or a number like 5-20 (seconds). That would change the meaning of max_standby_delay. Currently, it's the delay between *generating* and applying a WAL record, your proposal would change it to mean delay between receiving and applying it. That seems a lot less useful to me. With the current definition, I would feel pretty comfortable setting it to say 15 minutes, knowing that if the standby falls behind for any reason, as soon as the connection is re-established or archiving/restoring fixed, it will catch up quickly, blowing away any read-only queries if required. With your new definition, the standby would in the worst case pause for 15 minutes at every WAL file. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 15 Jan 2010 18:09 On Fri, 2010-01-15 at 20:50 +0200, Heikki Linnakangas wrote: > > So in either case, when we are waiting for new input we reset the timer > > as soon as new WAL is received. The resolution/accuracy of standby_delay > > will be no more than the time taken to replay a single file. This > > shouldn't matter, since sane settings of max_standby_delay are either 0 > > or a number like 5-20 (seconds). > > That would change the meaning of max_standby_delay. Currently, it's the > delay between *generating* and applying a WAL record, your proposal > would change it to mean delay between receiving and applying it. That > seems a lot less useful to me. Remember that this proposal is about responding to your comments. You showed that the time difference between generating and applying a WAL record lacked useful meaning in cases where the generation was not smooth and continuous. So, taking your earlier refutation as still observing a problem, I definitely do redefine the meaning of max_standby_delay. As you say "standby delay" means the difference between receive and apply. The bottom line here is: are you willing to dismiss your earlier observation of difficulties? I don't think you can... > With the current definition, I would feel pretty comfortable setting it > to say 15 minutes, knowing that if the standby falls behind for any > reason, as soon as the connection is re-established or > archiving/restoring fixed, it will catch up quickly, blowing away any > read-only queries if required. With your new definition, the standby > would in the worst case pause for 15 minutes at every WAL file. Yes, it does. And I know you're thinking along those lines because we are concurrently discussing how to handle re-connection after updates. The alternative is this: after being disconnected for 15 minutes we reconnect. For the next X minutes the standby will be almost unusable for queries while we catch up again. --- So, I'm left with thinking that both of these ways are right, in different circumstances and with different priorities. If your priority is High Availability, then you are willing to give up the capability for long-ish queries when that conflicts with the role of HA server. (delay = apply - generate). If your priority is a Reporting Server, then you are willing to give up HA capability in return for relatively uninterrupted querying (delay = apply - receive). Do we agree the two goals are mutually exclusive? If so, I think we need another parameter to express those configuration goals. Also, I think we need some ways to explicitly block recovery to allow queries to run, and some ways to explicitly block queries so recovery can run. Perhaps we need a way to block new queries on a regular basis, so that recovery gets a chance to run. Kind of time-slicing algorithm, like OS. That way we could assign a relative priority to each. Hmmm. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Dimitri Fontaine on 16 Jan 2010 08:08 Simon Riggs <simon(a)2ndQuadrant.com> writes: > On Fri, 2010-01-15 at 20:50 +0200, Heikki Linnakangas wrote: > Yes, it does. And I know you're thinking along those lines because we > are concurrently discussing how to handle re-connection after updates. With my State Machine proposal, we could only apply max_standby_delay if in sync state, and cancel query unconditionally otherwise. > The alternative is this: after being disconnected for 15 minutes we > reconnect. For the next X minutes the standby will be almost unusable > for queries while we catch up again. That's it. And it could be the cause of another GUC, do we want to give priority to catching-up to get back in sync, or to running queries. That would affect to when we apply max_standby_delay, and when set to prefer running queries it'd apply in any state as soon as we accept connections. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 16 Jan 2010 11:22 On Sat, 2010-01-16 at 14:08 +0100, Dimitri Fontaine wrote: > Simon Riggs <simon(a)2ndQuadrant.com> writes: > > On Fri, 2010-01-15 at 20:50 +0200, Heikki Linnakangas wrote: > > Yes, it does. And I know you're thinking along those lines because we > > are concurrently discussing how to handle re-connection after updates. > > With my State Machine proposal, we could only apply max_standby_delay if > in sync state, and cancel query unconditionally otherwise. > > > The alternative is this: after being disconnected for 15 minutes we > > reconnect. For the next X minutes the standby will be almost unusable > > for queries while we catch up again. > > That's it. And it could be the cause of another GUC, do we want to give > priority to catching-up to get back in sync, or to running queries. That > would affect to when we apply max_standby_delay, and when set to prefer > running queries it'd apply in any state as soon as we accept connections. Agreed. I'm wondering if it wouldn't just be easier to put in a plugin for recovery conflict handling, so the user can decide what to do themselves. That seems like a better plan than chewing through these issues now. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 16 Jan 2010 11:37 Simon Riggs <simon(a)2ndQuadrant.com> writes: > I'm wondering if it wouldn't just be easier to put in a plugin for > recovery conflict handling, so the user can decide what to do > themselves. That seems like a better plan than chewing through these > issues now. Making it a plugin doesn't solve anything. This is not the kind of thing where people can come up with some random policy and it will work well. Anyone competent to invent a better policy would be quite capable of modifying the source to suit themselves. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: [HACKERS] Hot Standby and handling max_standby_delay Next: Mammoth in Core? |