From: Simon Riggs on
On Tue, 2010-05-04 at 14:49 +0100, Simon Riggs wrote:

> The only difference is that max_standby_delay is measured from log
> timestamp. Perhaps it should work from WAL receipt timestamp rather than
> from log timestamp? That would make some of the problems go away without
> significantly changing the definition. I'll look at that.

Patch to implement this idea posted in response to OT, upthread, so I
can respond to the original complaints directly.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Bruce Momjian on
Simon Riggs wrote:
> On Mon, 2010-05-03 at 22:45 -0400, Bruce Momjian wrote:
>
> > As I remember, 9.0 has two behaviors:
> >
> > o master delays vacuum cleanup
> > o slave delays WAL application
> >
> > and in 9.1 we will be adding:
> >
> > o slave communicates snapshots to master
>
> > How would this figure into what we ultimately want in 9.1?
>
> We would still want all options, since "slave communicates snapshot to
> master" doesn't solve the problem it just moves the problem elsewhere.
> It's a question of which factors the user wishes to emphasise for their
> specific use.
>
> > I understand Simon's point that the two behaviors have different
> > benefits. However, I believe few users will be able to understand when
> > to use which.
>
> If users can understand how to set NDISTINCT for a column, they can
> understand this. It's not about complexity of UI, its about solving
> problems. When people hit an issue, I don't want to be telling people
> "we thought you wouldn't understand it, so we removed the parachute".
> They might not understand it *before* they hit a problem, so what? But
> users certainly will afterwards and won't say "thanks" if you prevent an
> option for them, especially for the stated reason. (My point about
> ndistinct: 99% of users have no idea that exists or when to use it, but
> it still exists as an option because it solves a known issue, just like
> this.)

Well, this is kind of my point --- that if few people are going to need
a parameter and it is going to take us to tell them to use it, it isn't
a good parameter because the other 99.9% are going to stare at the
parameters and not konw what it does or how it is different from other
similar parameters. Adding another parameter might help 0.1% of our
users, but it is going to confuse the other 99.9%. :-(

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Tue, 2010-05-04 at 13:00 -0400, Bruce Momjian wrote:

> Well, this is kind of my point --- that if few people are going to need
> a parameter and it is going to take us to tell them to use it, it isn't
> a good parameter because the other 99.9% are going to stare at the
> parameters and not konw what it does or how it is different from other
> similar parameters. Adding another parameter might help 0.1% of our
> users, but it is going to confuse the other 99.9%. :-(

You've missed my point. Most users of HS will need these parameters.
There is no need to understand them immediately, nor do I expect them to
do so. People won't understand why they exist until they've understood
the actual behaviour, received some errors and *then* they will
understand them, want them and need them. Just like deadlocks, ndistinct
and loads of other features we provide and support.

The current behaviour of max_standby_delay is designed to favour High
Availability users, not query users. I doubt that users with HA concerns
are only 0.1% of our users. I've accepted that some users may not put
that consideration first and so adding some minor, easy to implement
additional parameters will improve the behaviour for those people.
Forcing just one behaviour will be bad for many people.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on
Simon,

> Yes, the max wait on any *one* blocker will be max_standby_delay. But if
> you wait for two blockers, then the total time by which the standby lags
> will now be 2*max_standby_delay. Add a third, fourth etc and the standby
> lag keeps rising.

I still don't see how that works. If we're locking for applying log
segments, then any query which came in after the recovery lock would,
presumably, wait. So you'd have a lot of degraded query performance,
but no more than max_standby_delay of waiting to apply logs.

I'm more interested in your assertion that there's a lot in the
replication stream which doesn't take a lock; if that's the case, then
implementing any part of Tom's proposal is hopeless.

> * standby query delay - defined as the time that recovery will wait for
> a query to complete before a cancellation takes place. (We could
> complicate this by asking what happens when recovery is blocked twice by
> the same query? Would it wait twice, or does it have to track how much
> it has waited for each query in total so far?)

Aha! Now I see the confusion. AFAIK, Tom was proposing that the
pending recovery data would wait for max_standby_delay, total, then
cancel *all* queries which conflicted with it. Now that we've talked
this out, though, I can see that this can still result in "mass cancel"
issues, just like the current max_standby_delay. The main advantage I
can see to Tom's idea is that (presumably) it can be more discriminating
about which queries it cancels.

I agree that waiting on *each* query for "up to # time" would be a
completely different behavior, and as such, should be a option for DBAs.
We might make it the default option, but we wouldn't make it the only
option.

Speaking of which, was *your* more discriminating query cancel ever applied?

> Currently max_standby_delay seeks to constrain the standby lag to a
> particular value, as a way of providing a bounded time for failover, and
> also to constrain the amount of WAL that needs to be stored as the lag
> increases. Currently, there is no guaranteed minimum query delay given
> to each query.

Yeah, I can just see a lot of combinational issues with this. For
example, what if the user's network changes in some way to retard
delivery of log segments to the point where the delivery time is longer
than max_standby_delay? To say nothing about system clock synch, which
isn't perfect even if you have it set up.

I can see DBAs who are very focussed on HA wanting a standby-lag based
control anyway, when HA is far more important than the ability to run
queries on the slave. But I don't that that is the largest group; I
think that far more people will want to balance the two considerations.

Ultimately, as you say, we would like to have all three knobs:

standby lag: max time measured from master timestamp to slave timestamp

application lag: max time measured from local receipt of WAL records
(via log copy or recovery connection) to their application

query lag: max time any query which is blocking a recovery operation can run

These three, in combination, would let us cover most potential use
cases. So I think you've assessed that's where we're going in the
9.1-9.2 timeframe.

However, I'd say for 9.0 that "application lag" is the least confusing
option and the least dependant on the DBA's server room setup. So if we
can only have one of these for 9.0 (and I think going out with more than
one might be too complex, especially at this late date) I think that's
the way to go.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on
On Mon, May 3, 2010 at 4:37 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> 1. The timestamps we are reading from the log might be historical,

> 2. There could be clock skew between the master and slave servers.

> 3. There could be significant propagation delay from master to slave,

So it sounds like what you're expecting is for max_standby_delay to
represent not the maximum lag between server commit and standby commit
but rather the maximum lag introduced by conflicts. Or perhaps maximum
lag introduced relative to the lag present at startup. I think it's
possible to implement either of these and it would solve all three
problems above:

The slave maintains a static measure of how far behind it is from the
master. Every time it executes a recovery operation or waits on a
conflict it adds the time it spent executing or waiting. Every time it
executes a commit record it subtracts the *difference* between this
commit record and the last. I assume we clip at 0 so it never goes
negative which has odd effects but it seems to match what I would
expect to happen.

In the face of a standby recovering historical logs then it would
start with a assumed delay of 0. As long as the conflicts don't slow
down execution of the logs so that they run slower than the server
then the measured delay would stay near 0. The only time queries would
be canceled would be if the conflicts are causing problems replaying
the logs.

In the face of clock skew it nothing changes as long as the clocks run
at the same speed.

In the face of an environment where the master is idle I think this
scheme has the same problems you described but I think this might be
manageable. Perhaps we need more timestamps in the master's log stream
aside from the commit timestamps. Or perhaps we don't care about
standby delay except when reading a commit record since any other
record isn't actually delayed unless its commit is delayed.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers