From: Tatsuo Ishii on
> When the trigger file is created while the recovery keeps
> waiting for the release of the lock by read only queries,
> it might take a very long time for the standby to become
> the master. The recovery cannot go ahead until those read
> only queries have gone away. This would increase the downtime
> at the failover, and degrade the high availability.
>
> To fix the problem, when the trigger file is found, I think
> that we should cancel all the running read only queries
> immediately (or forcibly use -1 as the max_standby_delay
> since that point) and make the recovery go ahead. If some
> people prefer queries over failover even when they create the
> trigger file, we can make the trigger behavior selectable in
> response to the content of the trigger file like pg_standby
> does.
>
> This problem looks like a bug, so I'd like to fix that for
> 9.0. But the amount of code change might not be small.
> Thought?

+1. Down time of HA system is really important for HA users.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Takahiro Itagaki on

Fujii Masao <masao.fujii(a)gmail.com> wrote:

> To fix the problem, when the trigger file is found, I think
> that we should cancel all the running read only queries
> immediately (or forcibly use -1 as the max_standby_delay
> since that point) and make the recovery go ahead.

Hmmm, does the following sequence work as your expect instead of the chanage?
It requires text-file manipulation in 1, but seems to be more flexible.

1. Reset max_standby_delay = 0 in postgresql.conf
2. pg_ctl reload
3. Create a trigger file

BTW, I hope we will have "pg_ctl failover --timeout=N" in 9.1
instead of the trigger file based management.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on
On Wed, Jun 9, 2010 at 5:47 PM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
> To fix the problem, when the trigger file is found, I think
> that we should cancel all the running read only queries
> immediately (or forcibly use -1 as the max_standby_delay
> since that point) and make the recovery go ahead.

Oops! I made an error. I meant 0 instead of -1, as the max_standby_delay.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on
On Wed, Jun 9, 2010 at 6:13 PM, Takahiro Itagaki
<itagaki.takahiro(a)oss.ntt.co.jp> wrote:
>> To fix the problem, when the trigger file is found, I think
>> that we should cancel all the running read only queries
>> immediately (or forcibly use -1 as the max_standby_delay
>> since that point) and make the recovery go ahead.
>
> Hmmm, does the following sequence work as your expect instead of the chanage?
> It requires text-file manipulation in 1, but seems to be more flexible.
>
> �1. Reset max_standby_delay = 0 in postgresql.conf
> �2. pg_ctl reload
> �3. Create a trigger file

As far as I read the HS code, SIGHUP is not checked while a recovery
is waiting for queries :( So pg_ctl reload would have no effect on
the conflicting queries.

Independently from the problem I raised, I think that we should call
HandleStartupProcInterrupts() in that sleep loop.

> BTW, I hope we will have "pg_ctl failover --timeout=N" in 9.1
> instead of the trigger file based management.

Please feel free to try that ;)

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Fujii Masao <masao.fujii(a)gmail.com> writes:
> When the trigger file is created while the recovery keeps
> waiting for the release of the lock by read only queries,
> it might take a very long time for the standby to become
> the master. The recovery cannot go ahead until those read
> only queries have gone away. This would increase the downtime
> at the failover, and degrade the high availability.

> To fix the problem, when the trigger file is found, I think
> that we should cancel all the running read only queries
> immediately (or forcibly use -1 as the max_standby_delay
> since that point) and make the recovery go ahead. If some
> people prefer queries over failover even when they create the
> trigger file, we can make the trigger behavior selectable in
> response to the content of the trigger file like pg_standby
> does.

> This problem looks like a bug, so I'd like to fix that for
> 9.0. But the amount of code change might not be small.
> Thought?

-1. This looks like 9.1 material to me, and besides I'm not even
convinced that what you propose is a good solution.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers