max_standby_delay considered harmful [PgSql]

Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby

From: Simon Riggs on 3 May 2010 14:22

On Mon, 2010-05-03 at 13:13 -0400, Stephen Frost wrote:
> * Simon Riggs (simon(a)2ndQuadrant.com) wrote:
> > I guarantee that if that proposal goes in, people will complain about
> > that also. Last minute behaviour changes are bad news. I don't object to
> > adding something, just don't take anything away. It's not like the code
> > for it is pages long or anything.
>
> I have to disagree with this. If it goes into 9.0 this way then we're
> signing up to support it for *years*. With something as fragile as the
> existing setup (as outlined by Tom), that's probably not a good idea.
> We've not signed up to support the existing behaviour at all yet-
> alpha's aren't a guarentee of what we're going to release.

That's a great argument, either way. We will have to live with 9.0 for
many years and so that's why I mention having both. Make a choice either
way and we take a risk. Why?

> > The trade off is HA or queries and two modes make sense for user choice.
>
> The option isn't being thrown out, it's just being made to depend on
> something which is alot easier to measure while still being very useful
> for the trade-off you're talking about. I don't really see a downside
> to this, to be honest. Perhaps you could speak to the specific user
> experience difference that you think there would be from this change?
>
> +1 from me on Tom's proposal.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 3 May 2010 14:37

On Mon, 2010-05-03 at 13:21 -0400, Robert Haas wrote:
> On Mon, May 3, 2010 at 11:37 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> > I'm inclined to think that we should throw away all this logic and just
> > have the slave cancel competing queries if the replay process waits
> > more than max_standby_delay seconds to acquire a lock.
>
> What if we somehow get into a situation where the replay process is
> waiting for a lock over and over and over again, because it keeps
> killing conflicting processes but something restarts them and they
> take locks over again? It seems hard to ensure that replay will make
> adequate progress with any substantially non-zero value of
> max_standby_delay under this definition.

That is one argument against, and a reason why just one route is bad.

We already have more than one way, so another option is useful

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 3 May 2010 15:12

On Mon, 2010-05-03 at 13:13 -0400, Stephen Frost wrote:

> Perhaps you could speak to the specific user
> experience difference that you think there would be from this change?

The difference is really to do with the weight you give to two different
considerations

* avoid query cancellations
* avoid having recovery fall behind, so that failover time is minimised

Some people recognise the trade-offs and are planning multiple standby
servers dedicated to different roles/objectives.

Some people envisage Hot Standby as a platform for running very fast
SELECTs, for which retrying the query is a reasonable possibility and
for whom keeping the standby as up-to-date as possible is an important
consideration from a data freshness perspective. Others view HS as a
weapon against long running queries.

My initial view was that the High Availability goal/role should be the
default or most likely mode of operation. I would say that the current
max_standby_delay favours the HA route since it specifically limits the
amount by which server can fall behind.

Tom's proposed behaviour (has also been proposed before) favours the
avoid query cancellation route though could lead to huge amounts of lag.

I'm happy to have both options because I know this is a trade-off that
solution engineers want to have control of, not something we as
developers can choose ahead of time.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Stephen Frost on 3 May 2010 15:27

* Robert Haas (robertmhaas(a)gmail.com) wrote:
> On Mon, May 3, 2010 at 11:37 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> > I'm inclined to think that we should throw away all this logic and just
> > have the slave cancel competing queries if the replay process waits
> > more than max_standby_delay seconds to acquire a lock.
>
> What if we somehow get into a situation where the replay process is
> waiting for a lock over and over and over again, because it keeps
> killing conflicting processes but something restarts them and they
> take locks over again? It seems hard to ensure that replay will make
> adequate progress with any substantially non-zero value of
> max_standby_delay under this definition.

That was my first question too- but I reread what Tom wrote and came to
a different conclusion: If the reply process waits more than
max_standby_delay to acquire a lock, then it will kill off *everything*
it runs into from that point forward, until it's done with whatever is
currently available. At that point, the 'timer' would reset back to
zero.

When/how that timer gets reset was a question I had, but I feel like
"until nothing is available" makes sense and is what I assumed Tom was
thinking.

Thanks,

Stephen

From: Stephen Frost on 3 May 2010 15:32

Simon,

* Simon Riggs (simon(a)2ndQuadrant.com) wrote:
> Tom's proposed behaviour (has also been proposed before) favours the
> avoid query cancellation route though could lead to huge amounts of lag.

My impression of Tom's suggestion was that it would also be a maximum
amount of delay which would be allowed before killing off queries- not
that it would be able to wait indefinitely until no one is blocking.
Based on that, I don't know that there's really much user-seen behaviour
between the two, except in 'oddball' situations, where there's a time
skew between the servers, or a large lag, etc, in which case I think
Tom's proposal would be more likely what's 'expected', whereas what you
would get with the existing implementation (zero time delay, or far too
much) would be a 'gotcha'..

Thanks,

Stephen

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby