max_standby_delay considered harmful [PgSql]

Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby

From: Yeb Havinga on 6 May 2010 06:08

Rob Wultsch wrote:
> I manage a bunch of different environments and I am pretty sure that
> in any of them if the db started seemingly randomly killing queries I
> would have application teams followed quickly by executives coming
> after me with torches and pitchforks.
>
> I can not imagine setting this value to anything other than a bool and
> most of the time that bool would be -1. I would only be unleashing a
> kill storm in utter desperation and I would probably need to explain
> myself in detail after. Utter desperation means I am sure I am going
> to have to do a impactful failover at any moment and need a slave
> completely up to date NOW.
>
That's funny because when I was reading this thread, I was thinking the
exact opposite: having max_standby_delay always set to 0 so I know the
standby server is as up-to-date as possible. The application that
accesses the hot standby has to be 'special' anyway because it might
deliver not-up-to-date data. If that information about specialties
regarding querying the standby server includes the warning that queries
might get cancelled, they can opt for a retry themselves (is there a
special return code to catch that case? like PGRES_RETRY_LATER) or a
message to the user that their report is currently unavailable and they
should retry in a few minutes.

regards,
Yeb Havinga

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 6 May 2010 06:18

On Thu, May 6, 2010 at 1:35 AM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> Robert Haas wrote:
>> On Wed, May 5, 2010 at 11:52 PM, Bruce Momjian <bruce(a)momjian.us> wrote:
>>> I am afraid the current setting is tempting for users to enable, but
>>> will be so unpredictable that it will tarnish the repuation of HS and
>>> Postgres. We don't want to be thinking in 9 months, "Wow, we shouldn't
>>> have shipped that features. It is causing all kinds of problems." We
>>> have done that before (rarely), and it isn't a good feeling.
>>
>> I am not convinced it will be unpredictable. The only caveats that
>> I've seen so far are:
>>
>> - You need to run ntpd.
>> - Queries will get cancelled like crazy if you're not using steaming
>> replication.
>
> And also in situations where the master is idle for a while and then
> starts doing stuff. That's the most significant source of confusion,
> IMHO, I wouldn't mind the requirement of ntpd so much.

Oh. Ouch. OK, sorry, I missed that part. Wow, that's awful. OK, I
agree: we can't ship that as-is.

/me feels embarrassed for completely failing to understand the root of
the issue until 84 emails into the thread.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andres Freund on 6 May 2010 06:23

Hi,

On Thursday 06 May 2010 07:35:49 Heikki Linnakangas wrote:
> Robert Haas wrote:
> > On Wed, May 5, 2010 at 11:52 PM, Bruce Momjian <bruce(a)momjian.us> wrote:
> >> I am afraid the current setting is tempting for users to enable, but
> >> will be so unpredictable that it will tarnish the repuation of HS and
> >> Postgres. We don't want to be thinking in 9 months, "Wow, we shouldn't
> >> have shipped that features. It is causing all kinds of problems." We
> >> have done that before (rarely), and it isn't a good feeling.
> >
> > I am not convinced it will be unpredictable. The only caveats that
> > I've seen so far are:
> >
> > - You need to run ntpd.
> > - Queries will get cancelled like crazy if you're not using steaming
> > replication.
>
> And also in situations where the master is idle for a while and then
> starts doing stuff. That's the most significant source of confusion,
> IMHO, I wouldn't mind the requirement of ntpd so much.
Personally I would much rather like to keep that configurability and manually
generate a record a second. Or possibly do something akin to
archive_timeout...

That may be not as important once there are less sources of conflict
resolutions - but thats something *definitely* not going to happen for 9.0...

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 6 May 2010 06:48

On Thu, 2010-05-06 at 11:36 +0200, Florian Pflug wrote:

> If there was an additional SQL-callable function that returned the backends the recovery process is currently waiting for, plus one that reported that last timestamp seen in the WAL, than all those different cancellation policies could be implemented as daemons that monitor recovery and kill backends as needed, no?
>
> That would allow people to experiment with different cancellation policies, and maybe shed some light on what the useful policies are in practice.

It would be easier to implement a conflict resolution plugin that is
called when a conflict occurs, allowing users to have a customisable
mechanism. Again, I have no objection to that proposal.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Florian Pflug on 6 May 2010 07:46

On May 6, 2010, at 12:48 , Simon Riggs wrote:
> On Thu, 2010-05-06 at 11:36 +0200, Florian Pflug wrote:
>> If there was an additional SQL-callable function that returned the backends the recovery process is currently waiting for, plus one that reported that last timestamp seen in the WAL, than all those different cancellation policies could be implemented as daemons that monitor recovery and kill backends as needed, no?
>>
>> That would allow people to experiment with different cancellation policies, and maybe shed some light on what the useful policies are in practice.
>
> It would be easier to implement a conflict resolution plugin that is
> called when a conflict occurs, allowing users to have a customisable
> mechanism. Again, I have no objection to that proposal.

True, providing a plugin API would be even better, since no SQL callable API would have to be devised, and possible algorithms wouldn't be constrained by such an API's limitations.

The existing max_standby_delay logic could be moved to such a plugin, living in contrib. Since it was already established (I believe) that the existing max_standby_delay logic is sufficiently fragile to require significant knowledge on the user's side about potential pitfalls, asking those users to install the plugin from contrib shouldn't be too much to ask for.

This way, users who really need something more sophisticated than recovery wins always or standby wins always are given the tools they need *if* they're willing to put in the extra effort. For those who don't, offering max_standby_delay probably does more harm than good anyway, so nothing is lost by not offering it in the first place.

best regards,
Florian Pflug

First | Prev | Next | Last
Pages: 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby