From: Dimitri Fontaine on
Bruce Momjian <bruce(a)momjian.us> writes:
> I have no idea why an objection from you should mean more than an
> objection from anyone else in the community, and I have no idea what an
> "extreme reaction" means, or why anyone should care.

Maybe I shouldn't say anything here. But clearly while you're spot on
that Simon's objection is worth just as much as any other contributor's,
I disagree that we shouldn't care about the way those people feel about
being a member of our community.

I appreciate your efforts to avoid having anyone here use such a wording
but I can't help to dislike your argument for it. I hope that's simply a
localisation issue (l10n is so much harder than i18n).

Anyway, I so much hate reading such exchanges here that I couldn't help
ranting about it. Back to suitable -hackers content.

> I think the concensus is to change this setting to a boolean. If you
> don't want to do it, I am sure we can find someone who will.

I don't think so. I understand the current state to be:
a. this problem is not blocking beta, but a must fix before release
b. we either have to change the API or the behavior
c. only one behavior change has been proposed, by Tom
d. proposed behavior would favor queries rather than availability
e. API change 1 is boolean + explicit pause/resume command
f. API change 2 is boolean + plugin facility, with a contrib for
current behavior.
g. API change 3 is boolean only

I don't remember reading any mail on this thread bearing consensus on
the choices above, but rather either one of us pushing for his vision or
people defending the current situation, complaining about it or asking
that a reasonable choice is made soon.

If we have to choose between reasonable and soon, soon won't be my
vote. Beta is meant to last more or less 3 months after all.

Each party's standing is clear. Decision remains to be made, and I guess
that the one writing the code will have a much louder voice.

Regards,
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Bruce Momjian on
Robert Haas wrote:
> On Sat, May 8, 2010 at 3:40 PM, Bruce Momjian <bruce(a)momjian.us> wrote:
> > Robert Haas wrote:
> >> On Sat, May 8, 2010 at 2:48 PM, Bruce Momjian <bruce(a)momjian.us> wrote:
> >> > I think the concensus is to change this setting to a boolean. ?If you
> >> > don't want to do it, I am sure we can find someone who will.
> >>
> >> I still think we should revert to Tom's original proposal.
> >
> > And Tom's proposal was to do it on WAL slave arrival time? ?If we could
> > get agreement from everyone that that is the proper direction, fine, but
> > I am hearing things like plugins, and other complexity that makes it
> > seem we are not getting closer to an agreed solution, and without
> > agreement, the simplest approach seems to be just to remove the part we
> > can't agree upon.
> >
> > I think the big question is whether this issue is significant enough
> > that we should ignore our policy of no feature design during beta.
>
> Tom's proposal was basically to define recovery_process_lock_timeout.
> The recovery process would wait X seconds for a lock, then kill
> whoever held it. It's not the greatest knob in the world for the
> reasons already pointed out, but I think it's still better than a
> boolean and will be useful to some users. And it's pretty simple.

I thought there was concern about lock stacking causing
unpredictable/unbounded delays. I am not sure boolean has a majority
vote, but I am suggesting that because it is the _minimal_ feature set,
and when we can't agree during beta, the minimal feature set seems like
the best choice.

Clearly, anything is more feature-full than boolean --- the big question
is whether Tom's proposal is significantly better than boolean that we
should spend the time designing and implementing it, with the
possibility it will all be changed in 9.1.

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on
Bruce Momjian wrote:
> I think the big question is whether this issue is significant enough
> that we should ignore our policy of no feature design during beta

The idea that you're considering removal of a feature that we already
have people using in beta and making plans around is a policy violation
too you know. A freeze should include not cutting things just because
their UI or implementation is not ideal yet. And you've been using the
word "consensus" here when there is no such thing. At best there's
barely a majority here among people who have stated an opinion, and
consensus means something much stronger even than that; that means
something closer to unanimity. I thought the summary of where the
project is at Josh wrote at
http://archives.postgresql.org/message-id/4BE31279.7040002(a)agliodbs.com
was excellent, both from a technical and a process commentary
standpoint. I'd be completely happy to follow that plan, and then we'd
be at a consensus--with no one left arguing.

It was very clear back in February that if SR didn't hit the feature set
to make HS less troublesome out of the box, there would be some
limitations here, and that set of concerns hasn't changed much since
then. I thought the backup plan if we didn't get things like xid
feedback was to keep the capability as written anyway, knowing that it's
still much better than no control over cancellation timing available at
all. Keep improving documentation around its issues, and continue to
hack away at them in user space and in the field. Then we do better for
9.1. You seem bent on removing the feedback part of that cycle.

The full statement of the ESR bit Josh was quoting is "Release early.
Release often. And listen to your customers."[1] My customers include
some of whom believed the PostgreSQL community process enough to
contribute toward the HS development that's been completed and donated
to the project. They have a pretty clear view on this I'm relaying when
I talk about what I'd like to see happen. They are saying they cannot
completely ignore their requirements for HA failover, but would be
willing to loosen them just a bit (increasing failover time slightly) if
it reduces the odds of query cancellation, and therefore improves how
much load they can expect to push toward the standby. max_standby_delay
is a currently available mechanism that does that. I'm not going to be
their nanny and say "no, that's not perfectly predictable, you might get
a query canceled sometimes when you don't expect it anyway".

Instead, I was hoping to let them deploy 9.0 with this option available
(but certainly not the default), informed of the potential risks, see
how that goes. We can confirm whether the userland workarounds we
believe will be effective here really are. If so, then we can solider
forward directly incorporating them into the server code, knowing that
works. If not, switch to one of the safer modes, see if there's
something better to use altogether in 9.1, and perhaps this whole
approach gets removed. That's healthy development progress either way.

Upthread Bruce expressed some concern that this was going to live
forever once deployed. There is no way I'm going to let this behavior
continue to be available in 9.1 if field tests say the workarounds
aren't good enough. That's going to torture all of us who do customer
deployments of this technology every day if that turns out to be the
case, and nobody is going to feel the heat from that worse than
2ndQuadrant. I did a round once of removing GUCs that didn't do what
they were expected to in the field before, based on real-world tests
showing regular misuse, and I'll do it again if this falls into that
same category. We've already exposed this release to a whole stack of
risk from work during its development cycle, risk that doesn't really
drop much just from cutting this one bit. I'd at least like to get all
the reward possible from that risk, which I expected to include feedback
in this area.

Circumventing the planned development process by dropping this now will
ruin how I expected the project to feel out the right thing on the user
side, and we'll all be left with little more insight for what to do in
9.1 than we have now. And I'm not looking forward to explaining to
people why a feature they've been seeing and planning to deploy for
months has now been cut only after what was supposed to be a freeze for
beta.

[1]
http://catb.org/esr/writings/homesteading/cathedral-bazaar/ar01s04.html
, and this particular bit is quite relevant here: "Linus was keeping his
hacker/users constantly stimulated and rewarded�stimulated by the
prospect of having an ego-satisfying piece of the action, rewarded by
the sight of constant (even daily) improvement in their work. Linus was
directly aiming to maximize the number of person-hours thrown at
debugging and development, even at the possible cost of instability in
the code and user-base burnout if any serious bug proved intractable." I
continue to be disappointed at how contributing code to PostgreSQL is
far more likely to come with a dose of argument and frustration rather
than reward, and this discussion is a perfect example of such.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Bruce Momjian on
Greg Smith wrote:
> Bruce Momjian wrote:
> > I think the big question is whether this issue is significant enough
> > that we should ignore our policy of no feature design during beta
>
> The idea that you're considering removal of a feature that we already
> have people using in beta and making plans around is a policy violation
> too you know. A freeze should include not cutting things just because
> their UI or implementation is not ideal yet. And you've been using the
> word "consensus" here when there is no such thing. At best there's
> barely a majority here among people who have stated an opinion, and
> consensus means something much stronger even than that; that means
> something closer to unanimity. I thought the summary of where the
> project is at Josh wrote at
> http://archives.postgresql.org/message-id/4BE31279.7040002(a)agliodbs.com
> was excellent, both from a technical and a process commentary
> standpoint. I'd be completely happy to follow that plan, and then we'd
> be at a consensus--with no one left arguing.

I can't argue with anything you have said in your email. The big
question is whether designing during beta is worth it in this case, and
whether we can get something that is useful and gives us useful feedback
for 9.1, and is it worth spending the time to figure this out during
beta? If we can, great, let's do it, but I have not seen that yet, and
I am unclear how long we should keep trying to find it.

I think everyone agrees the current code is unusable, per Heikki's
comment about a WAL file arriving after a period of no WAL activity, and
look how long it took our group to even understand why that fails so
badly. I thought Tom's idea had problems, and there were ideas of how
to improve it. It just seems like we are drifting around on something
that has no easy solution, and not something that we are likely to hit
during beta where we should be focusing on the release. Saying we have
three months to fix this during beta seems like a recipe for delaying
the final release, and this feature is not worth that.

What we could do is to convert max_standby_delay to a boolean, 'ifdef'
out the code that was handling non-boolean cases, and then if someone
wants to work on a patch in a corner and propose something in a month
that improves this, we can judge the patch on its own merits, and apply
it if it is a great benefit, because basically that is what we are doing
now if we fix this --- adding a new patch/feature during beta.
(Frankly, because we are not requiring an initdb during beta, I am
unclear how we are going to rename max_standby_delay to behave as a
boolean.)

It is great if we can get a working max_standby_delay, but I fear
drifting/distraction at this stage.

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andres Freund on
On Sunday 09 May 2010 01:34:18 Bruce Momjian wrote:
> I think everyone agrees the current code is unusable, per Heikki's
> comment about a WAL file arriving after a period of no WAL activity, and
> look how long it took our group to even understand why that fails so
> badly.
To be honest its not *that* hard to simply make sure generating wal regularly
to combat that. While it surely aint a nice workaround its not much of a
problem either.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers