From: Josh Berkus on

> To fix the problem, when the trigger file is found, I think
> that we should cancel all the running read only queries
> immediately (or forcibly use -1 as the max_standby_delay
> since that point) and make the recovery go ahead. If some
> people prefer queries over failover even when they create the
> trigger file, we can make the trigger behavior selectable in
> response to the content of the trigger file like pg_standby
> does.

Well, the question is: are there users who would prefer not to have
slave queries cancelled and are willing to wait for failover? If so,
behavior of failover should really be slaved to max_standby_delay. If
not, there should be new behavior (i.e. "when the trigger file is found,
cancel all running queries"). One could argue that there are no users
of the first case.

The fact that failover current does *not* terminate existing queries and
transactions was regarded as a feature by the audience, rather than a
bug, when I did demos of HS/SR. Of course, they might not have been
thinking of the delay for writes.

If there were an easy way to make the trigger file cancel all running
queries, apply remaining logs and come up, then I'd vote for that for
9.0. I think it's the more desired behavior by most users. However,
I'm opposed to any complex solutions which might delay 9.0 release.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Wed, Jun 9, 2010 at 3:22 PM, Josh Berkus <josh(a)agliodbs.com> wrote:
>
>> To fix the problem, when the trigger file is found, I think
>> that we should cancel all the running read only queries
>> immediately (or forcibly use -1 as the max_standby_delay
>> since that point) and make the recovery go ahead. If some
>> people prefer queries over failover even when they create the
>> trigger file, we can make the trigger behavior selectable in
>> response to the content of the trigger file like pg_standby
>> does.
>
> Well, the question is: are there users who would prefer not to have
> slave queries cancelled and are willing to wait for failover? �If so,
> behavior of failover should really be slaved to max_standby_delay. �If
> not, there should be new behavior (i.e. "when the trigger file is found,
> cancel all running queries"). � One could argue that there are no users
> of the first case.
>
> The fact that failover current does *not* terminate existing queries and
> transactions was regarded as a feature by the audience, rather than a
> bug, when I did demos of HS/SR. �Of course, they might not have been
> thinking of the delay for writes.
>
> If there were an easy way to make the trigger file cancel all running
> queries, apply remaining logs and come up, then I'd vote for that for
> 9.0. �I think it's the more desired behavior by most users. �However,
> I'm opposed to any complex solutions which might delay 9.0 release.

One complication here is that, at least as I understand it, Tom is
planning to overhaul max_standby_delay. So it might be premature to
try to figure out how this should work until the dust settles. But my
intuition is similar to yours, overall.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Josh Berkus <josh(a)agliodbs.com> writes:
> The fact that failover current does *not* terminate existing queries and
> transactions was regarded as a feature by the audience, rather than a
> bug, when I did demos of HS/SR. Of course, they might not have been
> thinking of the delay for writes.

> If there were an easy way to make the trigger file cancel all running
> queries, apply remaining logs and come up, then I'd vote for that for
> 9.0. I think it's the more desired behavior by most users. However,
> I'm opposed to any complex solutions which might delay 9.0 release.

My feeling about it is that if you want fast failover you should not
have your failover target server configured as hot standby at all, let
alone hot standby with a long max_standby_delay. Such a slave could be
very far behind on applying WAL when the crunch comes, and no amount of
query killing will save you from that. Put your long-running standby
queries on a different slave instead.

We should consider whether we can improve the situation in 9.1, but it
is not a must-fix for 9.0; especially when the correct behavior isn't
immediately obvious.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Wed, 2010-06-09 at 12:22 -0700, Josh Berkus wrote:
> > To fix the problem, when the trigger file is found, I think
> > that we should cancel all the running read only queries
> > immediately (or forcibly use -1 as the max_standby_delay
> > since that point) and make the recovery go ahead. If some
> > people prefer queries over failover even when they create the
> > trigger file, we can make the trigger behavior selectable in
> > response to the content of the trigger file like pg_standby
> > does.
>
> Well, the question is: are there users who would prefer not to have
> slave queries cancelled and are willing to wait for failover? If so,
> behavior of failover should really be slaved to max_standby_delay. If
> not, there should be new behavior (i.e. "when the trigger file is found,
> cancel all running queries"). One could argue that there are no users
> of the first case.
>
> The fact that failover current does *not* terminate existing queries and
> transactions was regarded as a feature by the audience, rather than a
> bug, when I did demos of HS/SR. Of course, they might not have been
> thinking of the delay for writes.

+1

Just to add: there is only a delay in triggering *if* the standby is
waiting on a query at or after triggering. If there is a wait, it is
never more than max_standby_delay, which is what the user said they
would be happy to accept.

> If there were an easy way to make the trigger file cancel all running
> queries, apply remaining logs and come up, then I'd vote for that for
> 9.0. I think it's the more desired behavior by most users. However,
> I'm opposed to any complex solutions which might delay 9.0 release.

In 8.4 you could specify "fast" failover or "smart" failover. In 9.0,
AFAICS we have only implemented "smart" failover, which means it will
continue until the end of the WAL stream before triggering. So under
heavy streaming load or with considerable lag the trigger won't cause
failover for some time. So there is less function in 9.0 than was
available in 8.4. If that removal was intended, it wasn't discussed.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Takahiro Itagaki on

Fujii Masao <masao.fujii(a)gmail.com> wrote:

> > 1. Reset max_standby_delay = 0 in postgresql.conf
> > 2. pg_ctl reload
> > 3. Create a trigger file
>
> As far as I read the HS code, SIGHUP is not checked while a recovery
> is waiting for queries :( So pg_ctl reload would have no effect on
> the conflicting queries.
>
> Independently from the problem I raised, I think that we should call
> HandleStartupProcInterrupts() in that sleep loop.

Hmmm, if reload doesn't work, can we write a query like below?

SELECT pg_terminate_backend(pid)
FROM pg_locks
WHERE conflicted-with-recovery-process;

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers