Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby
From: Dimitri Fontaine on 8 May 2010 17:02 Bruce Momjian <bruce(a)momjian.us> writes: > I have no idea why an objection from you should mean more than an > objection from anyone else in the community, and I have no idea what an > "extreme reaction" means, or why anyone should care. Maybe I shouldn't say anything here. But clearly while you're spot on that Simon's objection is worth just as much as any other contributor's, I disagree that we shouldn't care about the way those people feel about being a member of our community. I appreciate your efforts to avoid having anyone here use such a wording but I can't help to dislike your argument for it. I hope that's simply a localisation issue (l10n is so much harder than i18n). Anyway, I so much hate reading such exchanges here that I couldn't help ranting about it. Back to suitable -hackers content. > I think the concensus is to change this setting to a boolean. If you > don't want to do it, I am sure we can find someone who will. I don't think so. I understand the current state to be: a. this problem is not blocking beta, but a must fix before release b. we either have to change the API or the behavior c. only one behavior change has been proposed, by Tom d. proposed behavior would favor queries rather than availability e. API change 1 is boolean + explicit pause/resume command f. API change 2 is boolean + plugin facility, with a contrib for current behavior. g. API change 3 is boolean only I don't remember reading any mail on this thread bearing consensus on the choices above, but rather either one of us pushing for his vision or people defending the current situation, complaining about it or asking that a reasonable choice is made soon. If we have to choose between reasonable and soon, soon won't be my vote. Beta is meant to last more or less 3 months after all. Each party's standing is clear. Decision remains to be made, and I guess that the one writing the code will have a much louder voice. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Bruce Momjian on 8 May 2010 18:51 Robert Haas wrote: > On Sat, May 8, 2010 at 3:40 PM, Bruce Momjian <bruce(a)momjian.us> wrote: > > Robert Haas wrote: > >> On Sat, May 8, 2010 at 2:48 PM, Bruce Momjian <bruce(a)momjian.us> wrote: > >> > I think the concensus is to change this setting to a boolean. ?If you > >> > don't want to do it, I am sure we can find someone who will. > >> > >> I still think we should revert to Tom's original proposal. > > > > And Tom's proposal was to do it on WAL slave arrival time? ?If we could > > get agreement from everyone that that is the proper direction, fine, but > > I am hearing things like plugins, and other complexity that makes it > > seem we are not getting closer to an agreed solution, and without > > agreement, the simplest approach seems to be just to remove the part we > > can't agree upon. > > > > I think the big question is whether this issue is significant enough > > that we should ignore our policy of no feature design during beta. > > Tom's proposal was basically to define recovery_process_lock_timeout. > The recovery process would wait X seconds for a lock, then kill > whoever held it. It's not the greatest knob in the world for the > reasons already pointed out, but I think it's still better than a > boolean and will be useful to some users. And it's pretty simple. I thought there was concern about lock stacking causing unpredictable/unbounded delays. I am not sure boolean has a majority vote, but I am suggesting that because it is the _minimal_ feature set, and when we can't agree during beta, the minimal feature set seems like the best choice. Clearly, anything is more feature-full than boolean --- the big question is whether Tom's proposal is significantly better than boolean that we should spend the time designing and implementing it, with the possibility it will all be changed in 9.1. -- Bruce Momjian <bruce(a)momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 8 May 2010 19:04 Bruce Momjian wrote: > I think the big question is whether this issue is significant enough > that we should ignore our policy of no feature design during beta The idea that you're considering removal of a feature that we already have people using in beta and making plans around is a policy violation too you know. A freeze should include not cutting things just because their UI or implementation is not ideal yet. And you've been using the word "consensus" here when there is no such thing. At best there's barely a majority here among people who have stated an opinion, and consensus means something much stronger even than that; that means something closer to unanimity. I thought the summary of where the project is at Josh wrote at http://archives.postgresql.org/message-id/4BE31279.7040002(a)agliodbs.com was excellent, both from a technical and a process commentary standpoint. I'd be completely happy to follow that plan, and then we'd be at a consensus--with no one left arguing. It was very clear back in February that if SR didn't hit the feature set to make HS less troublesome out of the box, there would be some limitations here, and that set of concerns hasn't changed much since then. I thought the backup plan if we didn't get things like xid feedback was to keep the capability as written anyway, knowing that it's still much better than no control over cancellation timing available at all. Keep improving documentation around its issues, and continue to hack away at them in user space and in the field. Then we do better for 9.1. You seem bent on removing the feedback part of that cycle. The full statement of the ESR bit Josh was quoting is "Release early. Release often. And listen to your customers."[1] My customers include some of whom believed the PostgreSQL community process enough to contribute toward the HS development that's been completed and donated to the project. They have a pretty clear view on this I'm relaying when I talk about what I'd like to see happen. They are saying they cannot completely ignore their requirements for HA failover, but would be willing to loosen them just a bit (increasing failover time slightly) if it reduces the odds of query cancellation, and therefore improves how much load they can expect to push toward the standby. max_standby_delay is a currently available mechanism that does that. I'm not going to be their nanny and say "no, that's not perfectly predictable, you might get a query canceled sometimes when you don't expect it anyway". Instead, I was hoping to let them deploy 9.0 with this option available (but certainly not the default), informed of the potential risks, see how that goes. We can confirm whether the userland workarounds we believe will be effective here really are. If so, then we can solider forward directly incorporating them into the server code, knowing that works. If not, switch to one of the safer modes, see if there's something better to use altogether in 9.1, and perhaps this whole approach gets removed. That's healthy development progress either way. Upthread Bruce expressed some concern that this was going to live forever once deployed. There is no way I'm going to let this behavior continue to be available in 9.1 if field tests say the workarounds aren't good enough. That's going to torture all of us who do customer deployments of this technology every day if that turns out to be the case, and nobody is going to feel the heat from that worse than 2ndQuadrant. I did a round once of removing GUCs that didn't do what they were expected to in the field before, based on real-world tests showing regular misuse, and I'll do it again if this falls into that same category. We've already exposed this release to a whole stack of risk from work during its development cycle, risk that doesn't really drop much just from cutting this one bit. I'd at least like to get all the reward possible from that risk, which I expected to include feedback in this area. Circumventing the planned development process by dropping this now will ruin how I expected the project to feel out the right thing on the user side, and we'll all be left with little more insight for what to do in 9.1 than we have now. And I'm not looking forward to explaining to people why a feature they've been seeing and planning to deploy for months has now been cut only after what was supposed to be a freeze for beta. [1] http://catb.org/esr/writings/homesteading/cathedral-bazaar/ar01s04.html , and this particular bit is quite relevant here: "Linus was keeping his hacker/users constantly stimulated and rewarded�stimulated by the prospect of having an ego-satisfying piece of the action, rewarded by the sight of constant (even daily) improvement in their work. Linus was directly aiming to maximize the number of person-hours thrown at debugging and development, even at the possible cost of instability in the code and user-base burnout if any serious bug proved intractable." I continue to be disappointed at how contributing code to PostgreSQL is far more likely to come with a dose of argument and frustration rather than reward, and this discussion is a perfect example of such. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Bruce Momjian on 8 May 2010 19:34 Greg Smith wrote: > Bruce Momjian wrote: > > I think the big question is whether this issue is significant enough > > that we should ignore our policy of no feature design during beta > > The idea that you're considering removal of a feature that we already > have people using in beta and making plans around is a policy violation > too you know. A freeze should include not cutting things just because > their UI or implementation is not ideal yet. And you've been using the > word "consensus" here when there is no such thing. At best there's > barely a majority here among people who have stated an opinion, and > consensus means something much stronger even than that; that means > something closer to unanimity. I thought the summary of where the > project is at Josh wrote at > http://archives.postgresql.org/message-id/4BE31279.7040002(a)agliodbs.com > was excellent, both from a technical and a process commentary > standpoint. I'd be completely happy to follow that plan, and then we'd > be at a consensus--with no one left arguing. I can't argue with anything you have said in your email. The big question is whether designing during beta is worth it in this case, and whether we can get something that is useful and gives us useful feedback for 9.1, and is it worth spending the time to figure this out during beta? If we can, great, let's do it, but I have not seen that yet, and I am unclear how long we should keep trying to find it. I think everyone agrees the current code is unusable, per Heikki's comment about a WAL file arriving after a period of no WAL activity, and look how long it took our group to even understand why that fails so badly. I thought Tom's idea had problems, and there were ideas of how to improve it. It just seems like we are drifting around on something that has no easy solution, and not something that we are likely to hit during beta where we should be focusing on the release. Saying we have three months to fix this during beta seems like a recipe for delaying the final release, and this feature is not worth that. What we could do is to convert max_standby_delay to a boolean, 'ifdef' out the code that was handling non-boolean cases, and then if someone wants to work on a patch in a corner and propose something in a month that improves this, we can judge the patch on its own merits, and apply it if it is a great benefit, because basically that is what we are doing now if we fix this --- adding a new patch/feature during beta. (Frankly, because we are not requiring an initdb during beta, I am unclear how we are going to rename max_standby_delay to behave as a boolean.) It is great if we can get a working max_standby_delay, but I fear drifting/distraction at this stage. -- Bruce Momjian <bruce(a)momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Andres Freund on 8 May 2010 19:42
On Sunday 09 May 2010 01:34:18 Bruce Momjian wrote: > I think everyone agrees the current code is unusable, per Heikki's > comment about a WAL file arriving after a period of no WAL activity, and > look how long it took our group to even understand why that fails so > badly. To be honest its not *that* hard to simply make sure generating wal regularly to combat that. While it surely aint a nice workaround its not much of a problem either. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |