From: Tatsuo Ishii on 9 Jun 2010 22:07 > The fact that failover current does *not* terminate existing queries and > transactions was regarded as a feature by the audience, rather than a > bug, when I did demos of HS/SR. Of course, they might not have been > thinking of the delay for writes. Probably you would hear different respose from serious users who are willing to have usable HA systems. I have number of customers who are using our HA systems (they use several technologies such as commercial HA solutions, pgpool-II and Slony-I). The one of top 3 questions I got when we propose them our HA solution is, "how long will it take to do failover when the master DB crashes?" -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Mark Kirkwood on 9 Jun 2010 22:36 On 10/06/10 14:07, Tatsuo Ishii wrote: > > The one of top 3 questions I got > when we propose them our HA solution is, "how long will it take to > do failover when the master DB crashes?" > > Same here +1 -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Fujii Masao on 10 Jun 2010 06:21 On Thu, Jun 10, 2010 at 5:06 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > Josh Berkus <josh(a)agliodbs.com> writes: >> The fact that failover current does *not* terminate existing queries and >> transactions was regarded as a feature by the audience, rather than a >> bug, when I did demos of HS/SR. Of course, they might not have been >> thinking of the delay for writes. > >> If there were an easy way to make the trigger file cancel all running >> queries, apply remaining logs and come up, then I'd vote for that for >> 9.0. I think it's the more desired behavior by most users. However, >> I'm opposed to any complex solutions which might delay 9.0 release. > > My feeling about it is that if you want fast failover you should not > have your failover target server configured as hot standby at all, let > alone hot standby with a long max_standby_delay. Such a slave could be > very far behind on applying WAL when the crunch comes, and no amount of > query killing will save you from that. Put your long-running standby > queries on a different slave instead. > > We should consider whether we can improve the situation in 9.1, but it > is not a must-fix for 9.0; especially when the correct behavior isn't > immediately obvious. OK. Let's revisit in 9.1. I attached the proposal patch for 9.1. The patch treats max_standby_delay as zero (i.e., cancels all the conflicting queries immediately), ever since the trigger file is created. So we can cause a recovery to end without waiting for any lock held by queries, and minimize the failover time. OTOH, queries which don't conflict with a recovery survive the failover. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
From: Fujii Masao on 10 Jun 2010 06:36 On Thu, Jun 10, 2010 at 9:58 AM, Takahiro Itagaki <itagaki.takahiro(a)oss.ntt.co.jp> wrote: > > Fujii Masao <masao.fujii(a)gmail.com> wrote: > >> > 1. Reset max_standby_delay = 0 in postgresql.conf >> > 2. pg_ctl reload >> > 3. Create a trigger file >> >> As far as I read the HS code, SIGHUP is not checked while a recovery >> is waiting for queries :( �So pg_ctl reload would have no effect on >> the conflicting queries. >> >> Independently from the problem I raised, I think that we should call >> HandleStartupProcInterrupts() in that sleep loop. > > Hmmm, if reload doesn't work, can we write a query like below? > > �SELECT pg_terminate_backend(pid) > � �FROM pg_locks > � WHERE conflicted-with-recovery-process; I'm not sure that, but as you suggested, we can minimize the failover time by using the following operation even in 9.0. 1. Reset max_standby_delay = 0 in postgresql.conf 2. pg_ctl reload 3. Cancel all the queries or all the conflicting ones 4. Create a trigger file For now, I'll use the above when building the HA system using 9.0 and a clusterware. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Josh Berkus on 10 Jun 2010 12:48 On 06/09/2010 07:36 PM, Mark Kirkwood wrote: > On 10/06/10 14:07, Tatsuo Ishii wrote: >> >> The one of top 3 questions I got >> when we propose them our HA solution is, "how long will it take to >> do failover when the master DB crashes?" >> > > Same here +1 In that case, wouldn't they set max_standby_delay to 0? In which case the failover problem goes away, no? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: [HACKERS] failover vs. read only queries Next: [HACKERS] walwriter not closing old files |