From: Alvaro Herrera on 12 May 2010 22:36 Excerpts from Robert Haas's message of mié may 12 20:48:41 -0400 2010: > On Wed, May 12, 2010 at 3:55 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: > > I am wondering if we are not correctly handling the case where we get > > a shutdown request while we are still in the PM_STARTUP state. It > > looks like we might go ahead and switch to PM_RECOVERY and then > > PM_RECOVERY_CONSISTENT without noticing the shutdown. There is some > > logic to handle the shutdown when the startup process exits, but if > > the startup process never exits it looks like we might get stuck. > > I can reproduce the behavior Stefan is seeing consistently using the > attached patch. > > W1: postgres -D ~/pgslave > W2: pg_ctl -D ~/pgslave stop If there's anything to learn from this patch, is that sleep is uninterruptible on some platforms. This is why sleeps elsewhere are broken down in loops, sleeping in small increments and checking interrupts each time. Maybe some of the sleeps in the new HS code need to be handled this way? -- -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 12 May 2010 22:41 On Wed, May 12, 2010 at 10:36 PM, Alvaro Herrera <alvherre(a)alvh.no-ip.org> wrote: > Excerpts from Robert Haas's message of mié may 12 20:48:41 -0400 2010: >> On Wed, May 12, 2010 at 3:55 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: >> > I am wondering if we are not correctly handling the case where we get >> > a shutdown request while we are still in the PM_STARTUP state. It >> > looks like we might go ahead and switch to PM_RECOVERY and then >> > PM_RECOVERY_CONSISTENT without noticing the shutdown. There is some >> > logic to handle the shutdown when the startup process exits, but if >> > the startup process never exits it looks like we might get stuck. >> >> I can reproduce the behavior Stefan is seeing consistently using the >> attached patch. >> >> W1: postgres -D ~/pgslave >> W2: pg_ctl -D ~/pgslave stop > > If there's anything to learn from this patch, is that sleep is > uninterruptible on some platforms. This is why sleeps elsewhere are > broken down in loops, sleeping in small increments and checking > interrupts each time. Maybe some of the sleeps in the new HS code need > to be handled this way? I don't think the problem is that the sleep is uninterruptible. I think the problem is that a smart shutdown request received while in the PM_STARTUP state does not acted upon until we enter the PM_RUN state. That is, there's a race condition between the SIGUSR1 that the startup process sends to the postmaster to signal that recovery has begun and the SIGTERM being sent by pg_ctl. However, since I haven't succeeded in producing a fix yet, take that with a grain of salt... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 16 May 2010 21:25 On Wed, May 12, 2010 at 10:41 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: > On Wed, May 12, 2010 at 10:36 PM, Alvaro Herrera > <alvherre(a)alvh.no-ip.org> wrote: >> Excerpts from Robert Haas's message of mié may 12 20:48:41 -0400 2010: >>> On Wed, May 12, 2010 at 3:55 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: >>> > I am wondering if we are not correctly handling the case where we get >>> > a shutdown request while we are still in the PM_STARTUP state. It >>> > looks like we might go ahead and switch to PM_RECOVERY and then >>> > PM_RECOVERY_CONSISTENT without noticing the shutdown. There is some >>> > logic to handle the shutdown when the startup process exits, but if >>> > the startup process never exits it looks like we might get stuck. >>> >>> I can reproduce the behavior Stefan is seeing consistently using the >>> attached patch. >>> >>> W1: postgres -D ~/pgslave >>> W2: pg_ctl -D ~/pgslave stop >> >> If there's anything to learn from this patch, is that sleep is >> uninterruptible on some platforms. This is why sleeps elsewhere are >> broken down in loops, sleeping in small increments and checking >> interrupts each time. Maybe some of the sleeps in the new HS code need >> to be handled this way? > > I don't think the problem is that the sleep is uninterruptible. I > think the problem is that a smart shutdown request received while in > the PM_STARTUP state does not acted upon until we enter the PM_RUN > state. That is, there's a race condition between the SIGUSR1 that the > startup process sends to the postmaster to signal that recovery has > begun and the SIGTERM being sent by pg_ctl. > > However, since I haven't succeeded in producing a fix yet, take that > with a grain of salt... I'm replying to this thread rather than the max_standby_delay thread because this line of discussion is not actually related to max_standby_delay. Fujii Masao previously reported this problem and proposed this fix: http://archives.postgresql.org/pgsql-hackers/2010-04/msg00592.php I responded by proposing that we instead simply adjust things so that a smart shutdown is always handled at the end of recovery, just prior to entering normal running. On further reflection, though, I've concluded that was a dumb idea. Currently, when a smart shutdown occurs during recovery, we handle it immediately UNLESS it happens before we receive the signal that tells us it's OK to start the background writer, in which case we get confused and lock everyone out of the database until a fast or immediate shutdown request arrives. So this is just a race condition, plain and simple. Therefore I think Fujii Masao's original idea was the best, but I have what I believe is an equivalent but simpler implementation, which is attached. Thoughts? Should we try to fix this in 8.4 also, or just in HEAD? 8.3 and 8.2 never handle a smart shutdown prior to entering normal running, and while that seems pretty useless, doing something different would be a behavior change, so that seems like a non-starter. 8.4 has the same behavior as HEAD, though it's not documented in the release notes, so it's not clear how intentional the change was. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
From: Simon Riggs on 17 May 2010 03:13 On Sun, 2010-05-16 at 21:25 -0400, Robert Haas wrote: > I have what I believe is > an equivalent but simpler implementation, which is attached. There's no code comments to explain this, so without in-depth analysis of the problem, Masao's patch and this one its not possible to say anything. Please explain in detail why its the right approach and put that in a comment, so we'll understand now and in the future. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Fujii Masao on 17 May 2010 03:38
On Mon, May 17, 2010 at 10:25 AM, Robert Haas <robertmhaas(a)gmail.com> wrote: > Therefore I think > Fujii Masao's original idea was the best, but I have what I believe is > an equivalent but simpler implementation, which is attached. Seems good. I found another two problems related to shutdown in PM_STARTUP state: (1) Smart or fast shutdown requested in PM_STARTUP state always removes the backup_label file if it exists. But it might be still required for subsequent recovery. I changed your patch so that additionally the postmaster skips deleting the backup_label in that case. (2) pg_ctl -ms stop emits the following warning whenever there is the backup_label file in $PGDATA. WARNING: online backup mode is active Shutdown will not complete until pg_stop_backup() is called. This warning doesn't fit in with the shutdown during recovery case. Since smart shutdown might be requested by other than pg_ctl, the warning should be emitted in server side rather than client, I think. How about moving the warning to the server side? > Thoughts? Should we try to fix this in 8.4 also, or just in HEAD? > 8.3 and 8.2 never handle a smart shutdown prior to entering normal > running, and while that seems pretty useless, doing something > different would be a behavior change, so that seems like a > non-starter. 8.4 has the same behavior as HEAD, though it's not > documented in the release notes, so it's not clear how intentional the > change was. In 8.4, smart shutdown during recovery waits until the startup process has exited. So the backporting to 8.4 doesn't improve any situation, I think. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center |