From: Robert Haas on
On Mon, May 3, 2010 at 3:39 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas(a)gmail.com> writes:
>> On Mon, May 3, 2010 at 11:37 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>>> I'm inclined to think that we should throw away all this logic and just
>>> have the slave cancel competing queries if the replay process waits
>>> more than max_standby_delay seconds to acquire a lock.
>
>> What if we somehow get into a situation where the replay process is
>> waiting for a lock over and over and over again, because it keeps
>> killing conflicting processes but something restarts them and they
>> take locks over again?
>
> They won't be able to take locks "over again", because the lock manager
> won't allow requests to pass a pending previous request, except in
> very limited circumstances that shouldn't hold here.  They'll queue
> up behind the replay process's lock request, not in front of it.
> (If that isn't the case, it needs to be fixed, quite independently
> of this concern.)

Well, the new backends needn't try to take "the same" locks as the
existing backends - the point is that in the worst case this proposal
means waiting max_standby_delay for EACH replay that requires taking a
lock. And that might be a LONG time.

One idea I had while thinking this over was to bound the maximum
amount of unapplied WAL rather than the absolute amount of time lag.
Now, that's a little fruity, because your WAL volume might fluctuate
considerably, so you wouldn't really know how far the slave was behind
the master chronologically. However, it would avoid all the time skew
issues, and it would also more accurately model the idea of a bound on
recovery time should we need to promote the standby to master, so
maybe it works out to a win. You could still end up stuck
semi-permanently behind, but never by more than N segments.

Stephen's idea of a mode where we wait up to max_standby_delay for a
lock but then kill everything in our path until we've caught up again
is another possible way of approaching this problem, although it may
lead to "kill storms". Some of that may be inevitable, though: a
bound on WAL lag has the same issue - if the primary is generating WAL
faster than the standby can apply it, the standby will eventually
decide to slaughter everything in its path.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on
Greg, Robert,

> Certainly that one particular case can be solved by making the
> servers be in time sync a prereq for HS working (in the traditional way).
> And by "prereq" I mean a "user beware" documentation warning.
>

Last I checked, you work with *lots* of web developers and web
companies. I'm sure you can see the issue with the above.

> Stephen's idea of a mode where we wait up to max_standby_delay for a
> lock but then kill everything in our path until we've caught up again
> is another possible way of approaching this problem, although it may
> lead to "kill storms".

Personally, I thought that the kill storms were exactly what was wrong
with max_standby_delay. That is, with MSD, no matter *what* your
settings or traffic are, you're going to get query cancel occasionally.

I don't see the issue with Tom's approach from a wait perspective. The
max wait becomes 1.001X max_standby_delay; there's no way I can think of
that replay would wait longer than that. I've yet to see an explanation
why it would be longer.

Simon's assertion that not all operations take a conventional lock is a
much more serious potential flaw.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Bruce Momjian on
Simon Riggs wrote:
> On Mon, 2010-05-03 at 13:13 -0400, Stephen Frost wrote:
>
> > Perhaps you could speak to the specific user
> > experience difference that you think there would be from this change?
>
> The difference is really to do with the weight you give to two different
> considerations
>
> * avoid query cancellations
> * avoid having recovery fall behind, so that failover time is minimised
>
> Some people recognise the trade-offs and are planning multiple standby
> servers dedicated to different roles/objectives.

I understand Simon's point that the two behaviors have different
benefits. However, I believe few users will be able to understand when
to use which.

As I remember, 9.0 has two behaviors:

o master delays vacuum cleanup
o slave delays WAL application

and in 9.1 we will be adding:

o slave communicates snapshots to master

How would this figure into what we ultimately want in 9.1?

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Mon, 2010-05-03 at 15:04 -0700, Josh Berkus wrote:

> I don't see the issue with Tom's approach from a wait perspective. The
> max wait becomes 1.001X max_standby_delay; there's no way I can think of
> that replay would wait longer than that. I've yet to see an explanation
> why it would be longer.

Yes, the max wait on any *one* blocker will be max_standby_delay. But if
you wait for two blockers, then the total time by which the standby lags
will now be 2*max_standby_delay. Add a third, fourth etc and the standby
lag keeps rising.

We need to avoid confusing these two measurables

* standby lag - defined as the total delay from when a WAL record is
written to the time the WAL record is applied. This includes both
transfer time and any delays imposed by Hot Standby.

* standby query delay - defined as the time that recovery will wait for
a query to complete before a cancellation takes place. (We could
complicate this by asking what happens when recovery is blocked twice by
the same query? Would it wait twice, or does it have to track how much
it has waited for each query in total so far?)

Currently max_standby_delay seeks to constrain the standby lag to a
particular value, as a way of providing a bounded time for failover, and
also to constrain the amount of WAL that needs to be stored as the lag
increases. Currently, there is no guaranteed minimum query delay given
to each query.

If every query is guaranteed its requested query delay then the standby
lag will be unbounded. Less cancellations, higher lag. Some people do
want this, though is not currently available. We can do this with two
new GUCs:

* standby_query_delay - USERSET parameter that allows user to specify a
guaranteed query delay, anywhere from 0 to maximum_standby_query_delay

* max_standby_query_delay - SIGHUP parameter - parameter exists to
provide DBA with a limit on the USERSET standby_query_delay, though I
can see some would say this is optional

Current behaviour is same as global settings of
standby_query_delay = 0
max_standby_query_delay = 0
max_standby_delay = X

So if people want minimal cancellations they would specify
standby_query_delay = Y (e.g. 30)
max_standby_query_delay = Z (e.g. 300)
max_standby_delay = -1

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Mon, 2010-05-03 at 22:45 -0400, Bruce Momjian wrote:

> As I remember, 9.0 has two behaviors:
>
> o master delays vacuum cleanup
> o slave delays WAL application
>
> and in 9.1 we will be adding:
>
> o slave communicates snapshots to master

> How would this figure into what we ultimately want in 9.1?

We would still want all options, since "slave communicates snapshot to
master" doesn't solve the problem it just moves the problem elsewhere.
It's a question of which factors the user wishes to emphasise for their
specific use.

> I understand Simon's point that the two behaviors have different
> benefits. However, I believe few users will be able to understand when
> to use which.

If users can understand how to set NDISTINCT for a column, they can
understand this. It's not about complexity of UI, its about solving
problems. When people hit an issue, I don't want to be telling people
"we thought you wouldn't understand it, so we removed the parachute".
They might not understand it *before* they hit a problem, so what? But
users certainly will afterwards and won't say "thanks" if you prevent an
option for them, especially for the stated reason. (My point about
ndistinct: 99% of users have no idea that exists or when to use it, but
it still exists as an option because it solves a known issue, just like
this.)

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers