From: Robert Haas on
On Thu, May 6, 2010 at 2:47 PM, Josh Berkus <josh(a)agliodbs.com> wrote:
>
>> Now that I've realized what the real problem is with max_standby_delay
>> (namely, that inactivity on the master can use up the delay), I think
>> we should do what Tom originally suggested here.  It's not as good as
>> a really working max_standby_delay, but we're not going to have that
>> for 9.0, and it's clearly better than a boolean.
>
> I guess I'm not clear on how what Tom proposed is fundamentally
> different from max_standby_delay = -1.  If there's enough concurrent
> queries, recovery would never catch up.

If your workload is that the standby server is getting pounded with
queries like crazy, then it's probably not that different: it will
fall progressively further behind. But I suspect many people will set
up standby servers where most of the activity happens on the primary,
but they run some reporting queries on the standby. If you expect
your reporting queries to finish in <10s, you could set the max delay
to say 60s. In the event that something gets wedged, recovery will
eventually kill it and move on rather than just getting stuck forever.
If the volume of queries is known not to be too high, it's reasonable
to expect that a few good whacks will be enough to get things back on
track.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on
All,

We are in Beta1, now, and it's May. Original goal for 9.0 was
June-July. We cannot be introducing major design revisions to HS/SR at
this date, or we won't ship until December.

There are at least 10 other major features in 9.0, some of which are
more important to some of our users than HS/SR. More importantly, I
think the discussion on this thread makes it very clear that no matter
how much discussion we have on standby delay, we are NOT going to get it
right the first time. That is, *even if* we replace Simon's code with
something else, that something else will have as many issues for real
users as the current delay does, especially since we won't even have
started debugging or testing the new code yet.

So changing to a lock-based mechanism or designing a plugin interface
are really not at all realistic at this date.

Realistically, we have two options at this point:

1) reduce max_standby_delay to a boolean.

2) have a delay option (based either on WAL glob start time or on system
time) like the current max_standby_delay, preferably with some bugs fixed.

If we do (1), we'll be having this discussion all over again in
September, and will be no better off because we won't have any
production feedback on Simon's approach. If we do (2) we can hedge it
in the documentation with requirements and cautions, and hopefully only
dedicated DBAs will touch it, and we'll get good feedback from them on
how we should redesign it for 9.1. And if it works as badly as Tom
expects, then we won't have an issue with maintaining backwards
compatibility, because people will be *happy* to change.

One way to communicate this would be to have 2 GUCs instead of one:
allow_query_cancel = on|off # defaults to on
max_standby_timeout = 0 # SEE DOCS BEFORE CHANGING

We named this release 9.0 because, among other things, we expected it to
be less stable than the prior 3 releases. And we can continue to tell
users that. I know I won't be moving any of my clients to 9.0.0.

I said it before and I'll say it again: "release early, release often".

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Bruce Momjian on
Josh Berkus wrote:
> All,
>
> We are in Beta1, now, and it's May. Original goal for 9.0 was
> June-July. We cannot be introducing major design revisions to HS/SR at
> this date, or we won't ship until December.
>
> There are at least 10 other major features in 9.0, some of which are
> more important to some of our users than HS/SR. More importantly, I
> think the discussion on this thread makes it very clear that no matter
> how much discussion we have on standby delay, we are NOT going to get it
> right the first time. That is, *even if* we replace Simon's code with
> something else, that something else will have as many issues for real
> users as the current delay does, especially since we won't even have
> started debugging or testing the new code yet.
>
> So changing to a lock-based mechanism or designing a plugin interface
> are really not at all realistic at this date.
>
> Realistically, we have two options at this point:
>
> 1) reduce max_standby_delay to a boolean.

I suggest calling it 'delay_wal_application' or 'wal_query_cancel' or
something like that.

> 2) have a delay option (based either on WAL glob start time or on system
> time) like the current max_standby_delay, preferably with some bugs fixed.

I don't think releasing something that many of us can barely understand
is going to help. I think Heikki is right that we might get feedback
from 9.0 that this setting isn't even useful. If we can't get this
right, and it seems we can't, we should just push this to 9.1.

Remember, delaying wal application just delays making the standby a
master and makes the slave data appear staler. We can just tell people
that the larger their queries are, the larger this delay will be. If
they want to control this, they can set 'statement_timeout' already.

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on
Bruce Momjian wrote:
> Remember, delaying wal application just delays making the standby a
> master and makes the slave data appear staler. We can just tell people
> that the larger their queries are, the larger this delay will be. If
> they want to control this, they can set 'statement_timeout' already.
>

While a useful defensive component, statement_timeout is a user setting,
so it can't provide guaranteed protection against a WAL application
denial of service from a long running query. A user that overrides the
system setting and kicks off a long query puts you right back into
needing a timeout to ensure forward progress of standby replay.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Bruce Momjian on
Greg Smith wrote:
> Bruce Momjian wrote:
> > Remember, delaying wal application just delays making the standby a
> > master and makes the slave data appear staler. We can just tell people
> > that the larger their queries are, the larger this delay will be. If
> > they want to control this, they can set 'statement_timeout' already.
> >
>
> While a useful defensive component, statement_timeout is a user setting,
> so it can't provide guaranteed protection against a WAL application
> denial of service from a long running query. A user that overrides the
> system setting and kicks off a long query puts you right back into
> needing a timeout to ensure forward progress of standby replay.

The nice thing about query cancel is that it give predictable behavior.
We could make statement_timeout that can't be changed if it is set in
postgresql.conf. Again, let's think of that for 9.1.

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers