failover vs. read only queries [PgSql]

Prev: [HACKERS] failover vs. read only queries
Next: [HACKERS] walwriter not closing old files

From: Tatsuo Ishii on 9 Jun 2010 22:07

> The fact that failover current does *not* terminate existing queries and
> transactions was regarded as a feature by the audience, rather than a
> bug, when I did demos of HS/SR. Of course, they might not have been
> thinking of the delay for writes.

Probably you would hear different respose from serious users who are
willing to have usable HA systems. I have number of customers who are
using our HA systems (they use several technologies such as commercial
HA solutions, pgpool-II and Slony-I). The one of top 3 questions I got
when we propose them our HA solution is, "how long will it take to
do failover when the master DB crashes?"
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Mark Kirkwood on 9 Jun 2010 22:36

On 10/06/10 14:07, Tatsuo Ishii wrote:
>
> The one of top 3 questions I got
> when we propose them our HA solution is, "how long will it take to
> do failover when the master DB crashes?"
>
>

Same here +1

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on 10 Jun 2010 06:21

On Thu, Jun 10, 2010 at 5:06 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Josh Berkus <josh(a)agliodbs.com> writes:
>> The fact that failover current does *not* terminate existing queries and
>> transactions was regarded as a feature by the audience, rather than a
>> bug, when I did demos of HS/SR. Of course, they might not have been
>> thinking of the delay for writes.
>
>> If there were an easy way to make the trigger file cancel all running
>> queries, apply remaining logs and come up, then I'd vote for that for
>> 9.0. I think it's the more desired behavior by most users. However,
>> I'm opposed to any complex solutions which might delay 9.0 release.
>
> My feeling about it is that if you want fast failover you should not
> have your failover target server configured as hot standby at all, let
> alone hot standby with a long max_standby_delay. Such a slave could be
> very far behind on applying WAL when the crunch comes, and no amount of
> query killing will save you from that. Put your long-running standby
> queries on a different slave instead.
>
> We should consider whether we can improve the situation in 9.1, but it
> is not a must-fix for 9.0; especially when the correct behavior isn't
> immediately obvious.

OK. Let's revisit in 9.1.

I attached the proposal patch for 9.1. The patch treats max_standby_delay
as zero (i.e., cancels all the conflicting queries immediately), ever since
the trigger file is created. So we can cause a recovery to end without
waiting for any lock held by queries, and minimize the failover time.
OTOH, queries which don't conflict with a recovery survive the failover.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From: Fujii Masao on 10 Jun 2010 06:36

On Thu, Jun 10, 2010 at 9:58 AM, Takahiro Itagaki
<itagaki.takahiro(a)oss.ntt.co.jp> wrote:
>
> Fujii Masao <masao.fujii(a)gmail.com> wrote:
>
>> > 1. Reset max_standby_delay = 0 in postgresql.conf
>> > 2. pg_ctl reload
>> > 3. Create a trigger file
>>
>> As far as I read the HS code, SIGHUP is not checked while a recovery
>> is waiting for queries :( �So pg_ctl reload would have no effect on
>> the conflicting queries.
>>
>> Independently from the problem I raised, I think that we should call
>> HandleStartupProcInterrupts() in that sleep loop.
>
> Hmmm, if reload doesn't work, can we write a query like below?
>
> �SELECT pg_terminate_backend(pid)
> � �FROM pg_locks
> � WHERE conflicted-with-recovery-process;

I'm not sure that, but as you suggested, we can minimize the failover
time by using the following operation even in 9.0.

1. Reset max_standby_delay = 0 in postgresql.conf
2. pg_ctl reload
3. Cancel all the queries or all the conflicting ones
4. Create a trigger file

For now, I'll use the above when building the HA system using 9.0
and a clusterware.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on 10 Jun 2010 12:48

On 06/09/2010 07:36 PM, Mark Kirkwood wrote:
> On 10/06/10 14:07, Tatsuo Ishii wrote:
>>
>> The one of top 3 questions I got
>> when we propose them our HA solution is, "how long will it take to
>> do failover when the master DB crashes?"
>>
>
> Same here +1

In that case, wouldn't they set max_standby_delay to 0? In which case
the failover problem goes away, no?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: [HACKERS] failover vs. read only queries
Next: [HACKERS] walwriter not closing old files