max_standby_delay considered harmful [PgSql]

Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby

From: Simon Riggs on 3 May 2010 12:48

On Mon, 2010-05-03 at 11:37 -0400, Tom Lane wrote:

> I've finally wrapped my head around exactly what the max_standby_delay
> code is doing, and I'm not happy with it.

Yes, I don't think I'd call it perfect yet.

> have the slave cancel competing queries if the replay process waits
> more than max_standby_delay seconds to acquire a lock. This is simple,
> understandable, and behaves the same whether we're reading live data or
> not.

I have no objection, and would welcome, adding another behaviour, since
that just gives us a better chance of having this feature do something
useful.

> I'm inclined to think that we should throw away all this logic

HS has been through 2 Alphas with the current behaviour and it will go
through 0 Alphas with the newly proposed behaviour. At this stage of
proceedings, that is extremely dangerous and I don't wish to do that.
The likelihood that we replace it with something worse seems fairly
high/certain: snap decision making never quite considers all angles.
Phrases like "throw away all this logic" don't give me confidence that
people that agree with that perspective would understand what they are
signing up to.

> Putting in something that tries to maintain a closed-loop maximum
> delay between master and slave seems like a topic for future research
> rather than a feature we have to have in 9.0. And in any case we'd
> still want the plain max delay for non-SR cases, AFAICS, because there's
> no sane way to use closed-loop logic in other cases.

I will be looking for ways to improve this over time.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Stefan Kaltenbrunner on 3 May 2010 12:54

Simon Riggs wrote:
> On Mon, 2010-05-03 at 11:37 -0400, Tom Lane wrote:
>
>> I've finally wrapped my head around exactly what the max_standby_delay
>> code is doing, and I'm not happy with it.
>
> Yes, I don't think I'd call it perfect yet.
>
>> have the slave cancel competing queries if the replay process waits
>> more than max_standby_delay seconds to acquire a lock. This is simple,
>> understandable, and behaves the same whether we're reading live data or
>> not.
>
> I have no objection, and would welcome, adding another behaviour, since
> that just gives us a better chance of having this feature do something
> useful.
>
>> I'm inclined to think that we should throw away all this logic
>
> HS has been through 2 Alphas with the current behaviour and it will go
> through 0 Alphas with the newly proposed behaviour. At this stage of
> proceedings, that is extremely dangerous and I don't wish to do that.
> The likelihood that we replace it with something worse seems fairly
> high/certain: snap decision making never quite considers all angles.
> Phrases like "throw away all this logic" don't give me confidence that
> people that agree with that perspective would understand what they are
> signing up to.

I'm not really sure how much serious testing outside of the small set of
people mostly interested in one or another specific aspect of HS/SR has
been actually done with the alphas to be honest.
I just started testing HS yesterday and I already ran twice into the
general issue tom is complaining about with max_standby_delay...

Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 3 May 2010 13:02

On Mon, 2010-05-03 at 18:54 +0200, Stefan Kaltenbrunner wrote:

> I'm not really sure how much serious testing outside of the small set of
> people mostly interested in one or another specific aspect of HS/SR has
> been actually done with the alphas to be honest.
> I just started testing HS yesterday and I already ran twice into the
> general issue tom is complaining about with max_standby_delay...

I guarantee that if that proposal goes in, people will complain about
that also. Last minute behaviour changes are bad news. I don't object to
adding something, just don't take anything away. It's not like the code
for it is pages long or anything.

The trade off is HA or queries and two modes make sense for user choice.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Stephen Frost on 3 May 2010 13:13

* Simon Riggs (simon(a)2ndQuadrant.com) wrote:
> I guarantee that if that proposal goes in, people will complain about
> that also. Last minute behaviour changes are bad news. I don't object to
> adding something, just don't take anything away. It's not like the code
> for it is pages long or anything.

I have to disagree with this. If it goes into 9.0 this way then we're
signing up to support it for *years*. With something as fragile as the
existing setup (as outlined by Tom), that's probably not a good idea.
We've not signed up to support the existing behaviour at all yet-
alpha's aren't a guarentee of what we're going to release.

> The trade off is HA or queries and two modes make sense for user choice.

The option isn't being thrown out, it's just being made to depend on
something which is alot easier to measure while still being very useful
for the trade-off you're talking about. I don't really see a downside
to this, to be honest. Perhaps you could speak to the specific user
experience difference that you think there would be from this change?

+1 from me on Tom's proposal.

Thanks,

Stephen

From: Robert Haas on 3 May 2010 13:21

On Mon, May 3, 2010 at 11:37 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> I'm inclined to think that we should throw away all this logic and just
> have the slave cancel competing queries if the replay process waits
> more than max_standby_delay seconds to acquire a lock.

What if we somehow get into a situation where the replay process is
waiting for a lock over and over and over again, because it keeps
killing conflicting processes but something restarts them and they
take locks over again? It seems hard to ensure that replay will make
adequate progress with any substantially non-zero value of
max_standby_delay under this definition.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby