max_standby_delay considered harmful [PgSql]

Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby

From: Simon Riggs on 4 May 2010 15:50

On Tue, 2010-05-04 at 11:27 -0700, Josh Berkus wrote:

> I still don't see how that works.
....

The good news is we agree by the time we get to the bottom... ;-)

> I'm more interested in your assertion that there's a lot in the
> replication stream which doesn't take a lock; if that's the case, then
> implementing any part of Tom's proposal is hopeless.

(No, still valid, the idea is generic)

> > * standby query delay - defined as the time that recovery will wait for
> > a query to complete before a cancellation takes place. (We could
> > complicate this by asking what happens when recovery is blocked twice by
> > the same query? Would it wait twice, or does it have to track how much
> > it has waited for each query in total so far?)
>
> Aha! Now I see the confusion.

BTW, Tom's proposal was approx half a sentence long so that is the
source of any confusion.

> AFAIK, Tom was proposing that the
> pending recovery data would wait for max_standby_delay, total, then
> cancel *all* queries which conflicted with it. Now that we've talked
> this out, though, I can see that this can still result in "mass cancel"
> issues, just like the current max_standby_delay. The main advantage I
> can see to Tom's idea is that (presumably) it can be more discriminating
> about which queries it cancels.

As I said to Stephen, this is exactly how it works already and wasn't
what was proposed.

> I agree that waiting on *each* query for "up to # time" would be a
> completely different behavior, and as such, should be a option for DBAs.
> We might make it the default option, but we wouldn't make it the only
> option.

Glad to hear you say that.

> Speaking of which, was *your* more discriminating query cancel ever applied?
>
> > Currently max_standby_delay seeks to constrain the standby lag to a
> > particular value, as a way of providing a bounded time for failover, and
> > also to constrain the amount of WAL that needs to be stored as the lag
> > increases. Currently, there is no guaranteed minimum query delay given
> > to each query.
>
> Yeah, I can just see a lot of combinational issues with this. For
> example, what if the user's network changes in some way to retard
> delivery of log segments to the point where the delivery time is longer
> than max_standby_delay? To say nothing about system clock synch, which
> isn't perfect even if you have it set up.
>
> I can see DBAs who are very focussed on HA wanting a standby-lag based
> control anyway, when HA is far more important than the ability to run
> queries on the slave. But I don't that that is the largest group; I
> think that far more people will want to balance the two considerations.
>
> Ultimately, as you say, we would like to have all three knobs:
>
> standby lag: max time measured from master timestamp to slave timestamp
>
> application lag: max time measured from local receipt of WAL records
> (via log copy or recovery connection) to their application

> query lag: max time any query which is blocking a recovery operation can run
>
> These three, in combination, would let us cover most potential use
> cases. So I think you've assessed that's where we're going in the
> 9.1-9.2 timeframe.
>
> However, I'd say for 9.0 that "application lag" is the least confusing
> option and the least dependant on the DBA's server room setup. So if we
> can only have one of these for 9.0 (and I think going out with more than
> one might be too complex, especially at this late date) I think that's
> the way to go.

Before you posted, I submitted a patch on this thread to redefine
max_standby_delay to depend upon the "application lag", as you've newly
defined it here - though obviously I didn't call it that. That solves
Tom's 3 issues. max_apply_delay might be technically more accurate term,
though isn't sufficiently better parameter name as to be worth the
change.

That patch doesn't implement his proposal, but that can be done as well
as (though IMHO not instead of). Given that two people have already
misunderstood what Tom proposed, and various people are saying we need
only one, I'm getting less inclined to have that at all.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on 4 May 2010 16:57

>> AFAIK, Tom was proposing that the
>> pending recovery data would wait for max_standby_delay, total, then
>> cancel *all* queries which conflicted with it. Now that we've talked
>> this out, though, I can see that this can still result in "mass cancel"
>> issues, just like the current max_standby_delay. The main advantage I
>> can see to Tom's idea is that (presumably) it can be more discriminating
>> about which queries it cancels.
>
> As I said to Stephen, this is exactly how it works already and wasn't
> what was proposed.

Well, it's not exactly how it works, as I understand it ... doesn't the
timer running out on the slave currently cancel *all* running queries
with old snapshots, regardless of what relations they touch?

> Before you posted, I submitted a patch on this thread to redefine
> max_standby_delay to depend upon the "application lag", as you've newly
> defined it here - though obviously I didn't call it that. That solves
> Tom's 3 issues. max_apply_delay might be technically more accurate term,
> though isn't sufficiently better parameter name as to be worth the
> change.

Yeah, that looks less complicated for admins. Thanks.

> That patch doesn't implement his proposal, but that can be done as well
> as (though IMHO not instead of). Given that two people have already
> misunderstood what Tom proposed, and various people are saying we need
> only one, I'm getting less inclined to have that at all.

Given your clarification on the whole set of behaviors, I'm highly
dubious about the idea of implementing Tom's proposal when we're already
Beta 1. It seems like a 9.1 thing.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 4 May 2010 18:53

Josh Berkus <josh(a)agliodbs.com> writes:
> Given your clarification on the whole set of behaviors, I'm highly
> dubious about the idea of implementing Tom's proposal when we're already
> Beta 1. It seems like a 9.1 thing.

I think you missed the point: "do nothing" is not a viable option.
I was proposing something that seemed simple enough to be safe to
drop into 9.0 at this point. I'm less convinced that what Simon
is proposing is safe enough.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 4 May 2010 19:06

On Tue, 2010-05-04 at 18:53 -0400, Tom Lane wrote:

> I think you missed the point: "do nothing" is not a viable option.
> I was proposing something that seemed simple enough to be safe to
> drop into 9.0 at this point.

I've posted a patch that meets your stated objections. If you could
review that, this could be done in an hour.

There are other ways, but you'll need to explain a proposal in enough
detail that we're clear what you actually mean.

> I'm less convinced that what Simon is proposing is safe enough.

Which proposal?

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on 4 May 2010 19:26

Tom Lane wrote:
> 1. The timestamps we are reading from the log might be historical,
> if we are replaying from archive rather than reading a live SR stream.
> In the current implementation that means zero grace period for standby
> queries. Now if your only interest is catching up as fast as possible,
> that could be a sane behavior, but this is clearly not the only possible
> interest --- in fact, if that's all you care about, why did you allow
> standby queries at all?
>

If the standby is not current, you may not want people to execute
queries against it. In some situations, returning results against
obsolete data is worse than not letting the query execute at all. As I
see it, the current max_standby_delay implementation includes the
expectation that the results you are getting are no more than
max_standby_delay behind the master, presuming that new data is still
coming in. If the standby has really fallen further behind than that,
there are situations where you don't want it doing anything but catching
up until that is no longer the case, and you especially don't want it
returning stale query data.

The fact that tuning in that direction could mean the standby never
actually executes any queries is something you need to monitor for--it
suggests the standby isn't powerful/well connected to the master enough
to keep up--but that's not necessarily the wrong behavior. Saying "I
only want the standby to execute queries if it's not too far behind the
master" is the answer to "why did you allow standby queries at all?"
when tuning for that use case.

> 2. There could be clock skew between the master and slave servers.
>

Not the database's problem to worry about. Document that time should be
carefully sync'd and move on. I'll add that.

> 3. There could be significant propagation delay from master to slave,
> if the WAL stream is being transmitted with pg_standby or some such.
> Again this results in cutting into the standby queries' grace period,
> for no defensible reason.
>

Then people should adjust their max_standby_delay upwards to account for
that. For high availability purposes, it's vital that the delay number
be referenced to the commit records on the master. If lag is eating a
portion of that, again it's something people should be monitoring for,
but not something we can correct. The whole idea here is that
max_standby_delay is an upper bound on how stale the data on the standby
can be, and whether or not lag is a component to that doesn't impact how
the database is being asked to act.

> In addition to these fundamental problems there's a fatal implementation
> problem: the actual comparison is not to the master's current clock
> reading, but to the latest commit, abort, or checkpoint timestamp read
> from the WAL.
Right; this has been documented for months at
http://wiki.postgresql.org/wiki/Hot_Standby_TODO and on the list before
that, i.e. "If there's little activity in the master, that can lead to
surprising results." The suggested long-term fix has been adding
keepalive timestamps into SR, which seems to get reinvented every time
somebody plays with this for a bit. The HS documentation improvements
I'm working on will suggest that you make sure this doesn't happen, that
people have some sort of keepalive WAL-generating activity on the
master regularly, if they expect max_standby_delay to work reasonably in
the face of an idle master. It's not ideal, but it's straightforward to
work around in user space.

> I'm inclined to think that we should throw away all this logic and just
> have the slave cancel competing queries if the replay process waits
> more than max_standby_delay seconds to acquire a lock. This is simple,
> understandable, and behaves the same whether we're reading live data or
> not.

I don't consider something that allows queries to execute when not
playing recent "live" data is necessarily a step forward, from the
perspective of implementations preferring high-availability. It's
reasonable for some people to request that the last thing a standby
that's not current (<max_standby_delay behind the master, based on the
last thing received) should be doing is answering any queries, when it
doesn't have current data and it should be working on catchup instead.

Discussion here obviously has wandered past your fundamental objections
here and onto implementation trivia, but I didn't think the difference
between what you expected and what's actually committed already was
properly addressed before doing that.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby