max_standby_delay considered harmful [PgSql]

Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby

From: Simon Riggs on 5 May 2010 03:16

On Tue, 2010-05-04 at 23:06 -0400, Bruce Momjian wrote:

> Should I be concerned that we are redesigning HS features at this stage
> in the release?

We knew we had to have one final discussion on HS snapshots. This is it.

Tom has raised valid issues, all of which already known. If we can
address them, we should.

A straightforward patch [walrcv_timestamp.patch] to address all of those
points. (Posted 13 hours prior to your post. That it was ignored by all
while debate continued is one point of concern, for me, though there
seems to have been confusion as to what that patch actually was.)

Tom has also raised a separate proposal, though that hasn't yet been
properly explained and there has been much debate about what he actually
meant. It is possible there is something worthwhile there, if that
involves adding a new capability. Myself, Stephen, Josh and Greg say
that changing max_standby_delay so there is no bounded startup time
would be a bad thing, if that is its only behaviour in 9.0.

I will tidy up walrcv_timestamp.patch and apply on Thu evening unless
there are concise, rational objections to that patch, which I consider
to be a bug fix and not blocked by beta.

Tom raised 7 other main points, that following detailed investigation
have resulted in 2 minor bugs, 2 unresolved questions on the patch and 1
further request for code comments. The 2 bugs affect corner cases only
and so are minor. They will be fixed over next few days since not
instant fixes. Open items list updated with items mentioned here, plus
performance query discussed on other thread. Nothing much here likely to
cause a problem if we need to go beta immediately, IMO.

I am mostly unavailable for next few days. (Repairing bikeshed.)

Expect at least 3 commits from me over next few days.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 5 May 2010 06:23

On Wed, May 5, 2010 at 3:16 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
> Expect at least 3 commits from me over next few days.

I think you need to rethink the way that you decide when it's time to
commit things. There is certainly no consensus on any of the things
you are proposing to commit, nor have they been adequately (or, uh, at
all) reviewed. Saying that your proposal addresses all of Tom's
objections doesn't make it so. I am planning to read that patch and
offer an opinion on it, but I haven't done so yet and I imagine Tom
will weigh in at some point as well. Racing to commit a pile of code
that nobody else has tested is not going to improve anything.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 5 May 2010 09:46

Simon Riggs wrote:
> The attached patch redefines "standby delay" to be the amount of time
> elapsed from point of receipt to point of application. The "point of
> receipt" is reset every chunk of data when streaming, or every file when
> reading file by file. In all cases this new time is later than the
> latest log time we would have used previously.

This seems completely wrong to me. If the WAL receiver keeps receiving
stuff, (last receive timestamp) - (current timestamp) would never be
more than a few seconds. Regardless of how much applying the WAL has
fallen behind.

To accomplish what you're trying to accomplish, you would need to label
each received WAL record with the timestamp when it was received, and
compare the reception timestamp of the record you're applying against
current timestamp.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 5 May 2010 09:58

Tom Lane wrote:
> Comments?

There's currently three ways to set max_standby_delay:

max_standby_delay = -1 # Query wins
max_standby_delay = 0 # Recovery wins
max_standby_delay > X # Query wins until lag > X.

As Tom points out, the 3rd option has all sorts of problems. I very much
like the behavior that max_standby_delay tries to accomplish, but I have
to agree that it's not very reliable as it is. I don't like Tom's
proposal either; the standby can fall behind indefinitely, and queries
get a varying grace period.

Let's rip out the concept of a delay altogether, and make it a boolean.
If you really want your query to finish, set it to -1 (using the current
max_standby_delay nomenclature). If recovery is important to you, set it
to 0.

If you have the monitoring in place to sensibly monitor the delay
between primary and standby, and you want a limit on that, you can put
together a script to flip the switch in postgresql.conf if the standby
falls too much behind.

It would be nice to make that settable per-session, BTW. Though as soon
as you have one session using -1, the standby could fall behind. Still,
it might be useful if you run both kinds of queries on the same standby.

Ok, now that we've gotten over that, here's another proposal for what a
delay setting could look like. Let's have a setting similar to
statement_timeout, that specifies how long a statement is allowed to run
until it becomes subject to killing if it conflicts with recovery
(actually, it would have to be a per-transaction setting, at least in
serializable mode). This would be similar to Tom's proposal, and it
would have the same drawback that it would give no guarantee on how much
the standby can fall behind. However, it would be easier to understand:
a query gets to run for X seconds, and after that it will be killed if
it gets in the way.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on 5 May 2010 10:19

Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes:
> Tom Lane wrote:
>> Comments?
>
> There's currently three ways to set max_standby_delay:
>
> max_standby_delay = -1 # Query wins
> max_standby_delay = 0 # Recovery wins
> max_standby_delay > X # Query wins until lag > X.
>
> As Tom points out, the 3rd option has all sorts of problems. I very much
> like the behavior that max_standby_delay tries to accomplish, but I have
> to agree that it's not very reliable as it is. I don't like Tom's
> proposal either; the standby can fall behind indefinitely, and queries
> get a varying grace period.
>
> Let's rip out the concept of a delay altogether, and make it a boolean.
> If you really want your query to finish, set it to -1 (using the current
> max_standby_delay nomenclature). If recovery is important to you, set it
> to 0.

I can't help but insisting on it, sorry. But.

The obvious solution to this problem for me is that to either make the
boolean reload friendly or to have pause/resume recovery. Ideally, both.

Then the default setting would be recovery wins, you pause the standby
replaying to ensure your query runs to completion. Very crude setting,
but 9.0 would offer easy to setup slave for *either* HA *or* off-load,
and a way to mitigate somehow.

The automated educated conflict solving based on some sort of timeout
running for one or all the current queries seems much harder to agree
upon when compared to applying existing code we tough we wouldn't yet
need. Let's revisit that decision: it seems to me we need it for 9.0.

Regards,
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby