From: Fujii Masao on
On Fri, Jun 11, 2010 at 1:48 AM, Josh Berkus <josh(a)agliodbs.com> wrote:
> On 06/09/2010 07:36 PM, Mark Kirkwood wrote:
>>
>> On 10/06/10 14:07, Tatsuo Ishii wrote:
>>>
>>> The one of top 3 questions I got
>>> when we propose them our HA solution is, "how long will it take to
>>> do failover when the master DB crashes?"
>>>
>>
>> Same here +1
>
> In that case, wouldn't they set max_standby_delay to 0? �In which case the
> failover problem goes away, no?

Yes, but I guess they'd also like to run read only queries on the standby.
Setting max_standby_delay to 0 would prevent them from doing that because
the conflict with the replay of the VACUUM or HOT record would often happen.
vacuum_defer_cleanup_age would be helpful for that case, but it seems to be
hard to tune that.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Bruce Momjian on
Fujii Masao wrote:
> On Thu, Jun 10, 2010 at 5:06 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> > Josh Berkus <josh(a)agliodbs.com> writes:
> >> The fact that failover current does *not* terminate existing queries and
> >> transactions was regarded as a feature by the audience, rather than a
> >> bug, when I did demos of HS/SR. ?Of course, they might not have been
> >> thinking of the delay for writes.
> >
> >> If there were an easy way to make the trigger file cancel all running
> >> queries, apply remaining logs and come up, then I'd vote for that for
> >> 9.0. ?I think it's the more desired behavior by most users. ?However,
> >> I'm opposed to any complex solutions which might delay 9.0 release.
> >
> > My feeling about it is that if you want fast failover you should not
> > have your failover target server configured as hot standby at all, let
> > alone hot standby with a long max_standby_delay. ?Such a slave could be
> > very far behind on applying WAL when the crunch comes, and no amount of
> > query killing will save you from that. ?Put your long-running standby
> > queries on a different slave instead.
> >
> > We should consider whether we can improve the situation in 9.1, but it
> > is not a must-fix for 9.0; especially when the correct behavior isn't
> > immediately obvious.
>
> OK. Let's revisit in 9.1.
>
> I attached the proposal patch for 9.1. The patch treats max_standby_delay
> as zero (i.e., cancels all the conflicting queries immediately), ever since
> the trigger file is created. So we can cause a recovery to end without
> waiting for any lock held by queries, and minimize the failover time.
> OTOH, queries which don't conflict with a recovery survive the failover.

Should this be added to the first 9.1 commitfest?

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Bruce Momjian <bruce(a)momjian.us> writes:
> Fujii Masao wrote:
>> On Thu, Jun 10, 2010 at 5:06 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>>> My feeling about it is that if you want fast failover you should not
>>> have your failover target server configured as hot standby at all, let
>>> alone hot standby with a long max_standby_delay. Such a slave could be
>>> very far behind on applying WAL when the crunch comes, and no amount of
>>> query killing will save you from that. Put your long-running standby
>>> queries on a different slave instead.
>>>
>>> We should consider whether we can improve the situation in 9.1, but it
>>> is not a must-fix for 9.0; especially when the correct behavior isn't
>>> immediately obvious.

>> OK. Let's revisit in 9.1.
>>
>> I attached the proposal patch for 9.1. The patch treats max_standby_delay
>> as zero (i.e., cancels all the conflicting queries immediately), ever since
>> the trigger file is created. So we can cause a recovery to end without
>> waiting for any lock held by queries, and minimize the failover time.
>> OTOH, queries which don't conflict with a recovery survive the failover.

> Should this be added to the first 9.1 commitfest?

Not sure ... it seems like proof of concept for a pretty dubious
concept. If you want a slave to be ready for fast failover then you
should not be letting it get far behind the master in the first place.
I think there's some missing piece here, but I'm not quite sure what
to propose.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers