Prev: [HACKERS] I am interested in the MERGE command implementation as my gSoC project
Next: pending patch: Re: [HACKERS] Streaming replication and pg_xlogfile_name()
From: Robert Haas on 13 Apr 2010 09:27 On Tue, Apr 13, 2010 at 9:18 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote: > On Thu, Apr 1, 2010 at 8:24 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: >> On Thu, Apr 1, 2010 at 7:18 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: >>> I'm not willing to investigate this further myself at this stage. This >>> looks like risk for little benefit. >> >> That's kind of what I figured. I'll see about fixing up Fujii-san's >> patch and documenting the behavior; but it won't happen before the >> weekend because I'm going to be out of town. > > I found the bug which makes smart shutdown get stuck for a while: > > If there is no WAL file available in the standby, walreceiver might > be invoked before we have reached the PM_RECOVERY state. We switch > to the PM_RECOVERY state after reading the checkpoint record pointed > out in the pg_control file. If it's not found, we are in the PM_INIT > or PM_START state and start walreceiver to read it from the primary. > > If smart shutdown is requested at that point, we cannot switch to > the PM_WAIT_READONLY state because pmdie() doesn't allow that. So > the SIGTERM is never sent to walreceiver, and smart shutdown would > get stuck. > > If the current state is either PM_INIT or PM_START, it's guaranteed > that there is no regular backend, so we should kill walreceiver as > soon as smart shutdown is requested, I think. The attached patch > does that. Can you explain how to recreate the problem that this patch fixes? ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Fujii Masao on 13 Apr 2010 10:16
On Tue, Apr 13, 2010 at 10:27 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: > Can you explain how to recreate the problem that this patch fixes? 1. Configure and start the primary server. 2. Configure the standby server. 3. Remove all of the WAL files in pg_xlog of the standby. 4. Start the standby. 5. Request smart shutdown against the standby before walreceiver receives any WAL records. You would need to emulate the time-consuming authentication which usually requires the setting of authentication_timeout. I used the attached patch for the emulation. New GUC parameter "wal_sender_sleep" which the patch provides for the test specifies the sleep time during walsender's handshake processing. If you set it to 10s, walsender sleeps 10 secs before it sends the WAL records. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center |