Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby
From: Robert Haas on 4 May 2010 07:13 On Tue, May 4, 2010 at 4:37 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: > option for them, especially for the stated reason. (My point about > ndistinct: 99% of users have no idea that exists or when to use it, but > it still exists as an option because it solves a known issue, just like > this.) Slightly OT, but funnily enough, when I was up in New York a couple of weeks ago with Bruce and a couple of other folks, I started talking with a DBA up there about his frustrations with PostgreSQL, and - I'm not making this up - the first example he gave me of something he wished he could do in PG to improve query planning was manually override ndistinct estimates. He was pleased to here that we'll have that in 9.0 and I was pleased to be able to tell him it was my patch. If you'd asked me what the odds that someone picking a missing feature would have come up with that one were, I'd have said a billion-to-one against. But I'm not making this up. To be honest, I am far from convinced that the existing behavior is a good one and I'm in favor of modifying it or ripping it out altogether if we can think of something better. But it has to really be better, of course, not just trading one set of pain points for another. ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 4 May 2010 07:41 On Tue, 2010-05-04 at 07:13 -0400, Robert Haas wrote: > On Tue, May 4, 2010 at 4:37 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: > > option for them, especially for the stated reason. (My point about > > ndistinct: 99% of users have no idea that exists or when to use it, but > > it still exists as an option because it solves a known issue, just like > > this.) > > Slightly OT, but funnily enough, when I was up in New York a couple of > weeks ago with Bruce and a couple of other folks, I started talking > with a DBA up there about his frustrations with PostgreSQL, and - I'm > not making this up - the first example he gave me of something he > wished he could do in PG to improve query planning was manually > override ndistinct estimates. He was pleased to here that we'll have > that in 9.0 and I was pleased to be able to tell him it was my patch. > If you'd asked me what the odds that someone picking a missing feature > would have come up with that one were, I'd have said a billion-to-one > against. But I'm not making this up. It matches my experience. I think its a testament to the expertise of our users as well to the hackers that have done so much to make that the top of user's lists for change. > To be honest, I am far from convinced that the existing behavior is a > good one and I'm in favor of modifying it or ripping it out altogether > if we can think of something better. But it has to really be better, > of course, not just trading one set of pain points for another. The only way I see as genuine better rather than just a different mix of trade-offs is to come up with ways where there are no conflicts. Hannu came up with one, using filesystem snapshots, but we haven't had time to implement that yet. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Stephen Frost on 4 May 2010 09:12 * Simon Riggs (simon(a)2ndQuadrant.com) wrote: > If recovery waits for max_standby_delay every time something gets in its > way, it should be clear that if many things get in its way it will > progressively fall behind. There is no limit to this and it can always > fall further behind. It does result in fewer cancelled queries and I do > understand many may like that. Guess I wasn't very clear in my previous description of what I *think* the change would be (Tom, please jump in if I've got this wrong..). Recovery wouldn't wait max_standby_delay every time; I agree, that would be a big change in behaviour and could make it very difficult for the slave to keep up. Rather, recovery would proceed as normal until it encounters a lock, at which point it would start a counting down from max_standby_delay, if the lock is released before it hits that, then it will move on, if another lock is encoutered, it would start counting down from where it left off last time. If it hits zero, it'll cancel the other query, and any other queries that get in the way, until it's caught up again completely. Once recovery is fully caught up, the counter would reset again to max_standby_delay. > That is *significantly* different from how it works now. (Plus: If there > really was no difference, why not leave it as is?) Because it's much more complicated the way it is, it doesn't really work as one would expect in a number of situations, and it's trying to guarantee something that it probably can't. > The bottom line is this is about conflict resolution. There is simply no > way to resolve conflicts without favouring one or other of the > protagonists. Whatever mechanism you come up with that favours one will, > disfavour the other. I'm happy to give choices, but I'm not happy to > force just one kind of conflict resolution. I don't think anyone is trying to get rid of the knob entirely; you're right, you can't please everyone all the time, so there has to be some kind of knob there which people can adjust based on their particular use case and system. This is about what exactly the knob is and how it's implemented and documented. Thanks, Stephen
From: Simon Riggs on 4 May 2010 09:49 On Tue, 2010-05-04 at 09:12 -0400, Stephen Frost wrote: > * Simon Riggs (simon(a)2ndQuadrant.com) wrote: > > If recovery waits for max_standby_delay every time something gets in its > > way, it should be clear that if many things get in its way it will > > progressively fall behind. There is no limit to this and it can always > > fall further behind. It does result in fewer cancelled queries and I do > > understand many may like that. > > Guess I wasn't very clear in my previous description of what I *think* > the change would be (Tom, please jump in if I've got this wrong..). > Recovery wouldn't wait max_standby_delay every time; I agree, that would > be a big change in behaviour and could make it very difficult for the > slave to keep up. Rather, recovery would proceed as normal until it > encounters a lock, at which point it would start a counting down from > max_standby_delay, if the lock is released before it hits that, then it > will move on, if another lock is encoutered, it would start counting > down from where it left off last time. If it hits zero, it'll cancel > the other query, and any other queries that get in the way, until it's > caught up again completely. Once recovery is fully caught up, the > counter would reset again to max_standby_delay. This new clarification is almost exactly how it works already. Sounds like the existing docs need some improvement. The only difference is that max_standby_delay is measured from log timestamp. Perhaps it should work from WAL receipt timestamp rather than from log timestamp? That would make some of the problems go away without significantly changing the definition. I'll look at that. (And that conflicts are caused by more situations than just locks, but that detail doesn't alter your point). > > The bottom line is this is about conflict resolution. There is simply no > > way to resolve conflicts without favouring one or other of the > > protagonists. Whatever mechanism you come up with that favours one will, > > disfavour the other. I'm happy to give choices, but I'm not happy to > > force just one kind of conflict resolution. > > I don't think anyone is trying to get rid of the knob entirely; you're > right, you can't please everyone all the time, so there has to be some > kind of knob there which people can adjust based on their particular use > case and system. This is about what exactly the knob is and how it's > implemented and documented. I'm happy with more than one way. It'd be nice if a single parameter, giving one dimension of tuning, suited all ways people have said they would like it to behave. I've not found a way of doing that. I have no problem at all with adding additional parameters or mechanisms to cater for the multiple dimensions of control people have asked for. So your original interpretation is also valid for some users. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 4 May 2010 10:27
Downthread, I said.. On Tue, 2010-05-04 at 14:49 +0100, Simon Riggs wrote: > The only difference is that max_standby_delay is measured from log > timestamp. Perhaps it should work from WAL receipt timestamp rather than > from log timestamp? That would make some of the problems go away without > significantly changing the definition. I'll look at that. Patch to implement this idea attached: for discussion, not tested yet. No docs yet. The attached patch redefines "standby delay" to be the amount of time elapsed from point of receipt to point of application. The "point of receipt" is reset every chunk of data when streaming, or every file when reading file by file. In all cases this new time is later than the latest log time we would have used previously. This addresses all of your points, as shown below. On Mon, 2010-05-03 at 11:37 -0400, Tom Lane wrote: > There are three really fundamental problems with it: > > 1. The timestamps we are reading from the log might be historical, > if we are replaying from archive rather than reading a live SR stream. > In the current implementation that means zero grace period for standby > queries. Now if your only interest is catching up as fast as possible, > that could be a sane behavior, but this is clearly not the only possible > interest --- in fact, if that's all you care about, why did you allow > standby queries at all? The delay used is from time of receipt of WAL, no longer from log date. So this would no longer apply. > 2. There could be clock skew between the master and slave servers. > If the master's clock is a minute or so ahead of the slave's, again we > get into a situation where standby queries have zero grace period, even > though killing them won't do a darn thing to permit catchup. If the > master is behind the slave then we have an artificially inflated grace > period, which is going to slow down the slave. The timestamp is from standby, not master, so this would no longer apply. > 3. There could be significant propagation delay from master to slave, > if the WAL stream is being transmitted with pg_standby or some such. > Again this results in cutting into the standby queries' grace period, > for no defensible reason. The timestamp is taken immediately at the point the WAL is ready for replay, so other timing overheads would not be included. > In addition to these fundamental problems there's a fatal implementation > problem: the actual comparison is not to the master's current clock > reading, but to the latest commit, abort, or checkpoint timestamp read > from the WAL. Thus, if the last commit was more than max_standby_delay > seconds ago, zero grace time. Now if the master is really idle then > there aren't going to be any conflicts anyway, but what if it's running > only long-running queries? Or what happens when it was idle for awhile > and then starts new queries? Zero grace period, that's what. > > We could possibly improve matters for the SR case by having walsender > transmit the master's current clock reading every so often (probably > once per activity cycle), outside the WAL stream proper. The receiver > could subtract off its own clock reading in order to measure the skew, > and then we could cancel queries if the de-skewed transmission time > falls too far behind. However this doesn't do anything to fix the cases > where we aren't reading (and caught up to) a live SR broadcast. -- Simon Riggs www.2ndQuadrant.com |