max_standby_delay considered harmful [PgSql]

Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby

From: Robert Haas on 9 May 2010 00:35

On Sun, May 9, 2010 at 12:08 AM, Bruce Momjian <bruce(a)momjian.us> wrote:
> Robert Haas wrote:
>> > Clearly, anything is more feature-full than boolean --- the big question
>> > is whether Tom's proposal is significantly better than boolean that we
>> > should spend the time designing and implementing it, with the
>> > possibility it will all be changed in 9.1.
>>
>> I doubt it's likely to be thrown out completely. We might decide to
>> fine-tune it in some way. My fear is that if we ship this with only a
>> boolean, we're shipping crippleware. If that fear turns out to be
>> unfounded, I will of course be happy, but that's my concern, and I
>> don't believe that it's entirely unfounded.
>
> Well, historically, we have been willing to not ship features if we
> can't get it right. No one has ever accused us of crippleware, but our
> hesitancy has caused slower user adoption, though long-term, it has
> helped us grow a dedicated user base that trusts us.

We can make the decision to not ship the feature if the feature is
"max_standby_delay". But I think the feature is "Hot Standby", which
I think we've pretty much committed to shipping. And I am concerned
that if the only mechanism for controlling query cancellation vs.
recovery lag is a boolean, people feel that we didn't get Hot Standby
right.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 9 May 2010 04:41

On Sat, 2010-05-08 at 14:48 -0400, Bruce Momjian wrote:

> I think the consensus is to change this setting to a boolean. If you
> don't want to do it, I am sure we can find someone who will.

You expect others to act on consensus and follow rules, yet ignore them
yourself when it suits your purpose. Your other points seem designed to
distract people from seeing that.

There is clear agreement that a problem exists. The action to take as a
result of that problem is very clearly in doubt and yet you repeatedly
ignore other people's comments and viable technical resolutions. If you
can find a cat's paw to break consensus for you, more fool them. You
might find someone with a good resolution, if you ask that instead.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on 9 May 2010 07:40

Bruce Momjian wrote:

> I think everyone agrees the current code is unusable, per Heikki's
> comment about a WAL file arriving after a period of no WAL
> activity

I don't.

I am curious to hear how many complaints we've had from alpha and
beta testers of HS regarding this issue. I know that if we used it
with our software, the issue would probably go unnoticed because of
our usage patterns and automatic query retry. A positive setting
would work as intended for us. I can think of pessimal usage
patterns, different software approaches, and/or goals for HS usage
which would conflict badly with a positive setting. Hopefully we
can document this area better than we've historically done with, for
example, fsync -- which has similar trade-offs, only with more dire
consequences for bad user choices.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Florian Pflug on 9 May 2010 10:10

On May 9, 2010, at 13:59 , Dimitri Fontaine wrote:
> Tom Lane <tgl(a)sss.pgh.pa.us> writes:
>> I like the proposal of a boolean because it provides only the minimal
>> feature set of two cases that are both clearly needed and easily
>> implementable. Whatever we do later is certain to provide a superset
>> of those two cases. If we do something else (and that includes my own
>> proposal of a straight lock timeout), we'll be implementing something
>> we might wish to take back later. Taking out features after they've
>> been in a release is very hard, even if we realize they're badly
>> designed.
>
> That's where I though my proposal fitted in. I fail to see us wanting to
> take back explicit pause/resume admin functions in any future release.
>
> Now, after having read Greg's arguments, my vote would be the following:
> - hot_standby_conflict_winner = queries|replay, defaults to replay
> - add pause/resume so that people can switch temporarily to queries
> - label max_standby_delay *experimental*, keep current code

Adding pause/resume seems to introduce some non-trivial locking problems, though. How would you handle a pause request if the recovery process currently held a lock?

Dropping the lock is not an option for correctness reasons. Otherwise you wouldn't have needed to take the lock in the first place, no?

Pausing with the lock held leads to priority-inversion like problems. Queries now might block until recovery is resumed - quite the opposite of what pause() is supposed to archive

The only remaining option is to continue applying WAL until you reach a point where no locks are held, then pause. But from a user's POV that is nearly indistinguishable from simply setting hot_standby_conflict_winner to in the first place I think.

best regards,
Florian Pflug

From: Greg Stark on 9 May 2010 12:47

On Sun, May 9, 2010 at 4:00 AM, Greg Smith <greg(a)2ndquadrant.com> wrote:
> The use cases are covered as best they can be without better support from
> expected future SR features like heartbeats and XID loopback.

For what it's worth I think deferring these extra complications is a
very useful exercise. I would like to see a system that doesn't depend
on them for basic functionality. In particular I would like to see a
system that can be useful using purely WAL log shipping without
streaming replication at all.

I'm a bit unclear how the boolean proposal would solve things though.
Surely if you set the boolean to recovery-wins then when using
streaming replication with any non-idle master virtually every query
would be cancelled immediately as every HOT cleanup would cause a
snapshot conflict with even short-lived queires in the slave.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby