Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby
From: Simon Riggs on 12 May 2010 10:40 On Wed, 2010-05-12 at 16:03 +0200, Stefan Kaltenbrunner wrote: > Simon Riggs wrote: > > On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote: > >> On Wed, May 12, 2010 at 7:26 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: > >>> On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote: > >>> > >>>> I'm not sure what to make of this. Sometimes not shutting down > >>>> doesn't sound like a feature to me. > >>> It acts exactly the same in recovery as in normal running. It is not a > >>> special feature of recovery at all, bug or otherwise. > >> Simon, that doesn't make any sense. We are talking about a backend > >> getting stuck forever on an exclusive lock that is held by the startup > >> process and which will never be released (for example, because the > >> master has shut down and no more WAL can be obtained for replay). The > >> startup process does not hold locks in normal operation. > > > > When I test it, startup process holding a lock does not prevent shutdown > > of a standby. > > > > I'd be happy to see your test case showing a bug exists and that the > > behaviour differs from normal running. > > In my testing the postmaster simply does not shut down even with no > clients connected any more once in a while - most of the time it works > just fine but in like 1 out of 10 cases it get's stuck - my testcase (as > detailed in the related thread) is simply doing an interval load on the > master (pgbench -T 120 && sleep 30 && pgbench -T 120 - rinse and repeat > as needed) and pgbench -S && pg_ctl restart && pgbench -S in a lop on > the standby. once in a while the standby will simply not shut down > (forever - not only by eceeding the default timeout of pgctl which seems > to get triggered much more often on the standby than on the master - > have not looked into that yet in detail) If you could recreate that on a server in debug mode we can see what's happening. If you can attach to the server and get a back trace that would help. I've not seen that behaviour at all during testing and if the issue is sporadic its not likely to help much trying to recreate myself. This could be an issue with SR, or an issue with the shutdown code itself. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 12 May 2010 11:28 On Wed, 2010-05-12 at 14:18 +0100, Simon Riggs wrote: > On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote: > > On Wed, May 12, 2010 at 7:26 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: > > > On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote: > > > > > >> I'm not sure what to make of this. Sometimes not shutting down > > >> doesn't sound like a feature to me. > > > > > > It acts exactly the same in recovery as in normal running. It is not a > > > special feature of recovery at all, bug or otherwise. > > > > Simon, that doesn't make any sense. We are talking about a backend > > getting stuck forever on an exclusive lock that is held by the startup > > process and which will never be released (for example, because the > > master has shut down and no more WAL can be obtained for replay). The > > startup process does not hold locks in normal operation. > > When I test it, startup process holding a lock does not prevent shutdown > of a standby. > > I'd be happy to see your test case showing a bug exists and that the > behaviour differs from normal running. Let me put this differently: I accept that Stefan has reported a problem. Neither Tom nor myself can reproduce the problem. I've re-run Stefan's test case and restarted the server more than 400 times now without any issue. I re-read your post where you gave what you yourself called "uninformed speculation". There's no real polite way to say it, but yes your speculation does appear to be uninformed, since it is incorrect. Reasons would be not least that Stefan's tests don't actually send any locks to the standby anyway (!), but even if they did your speculation as to the cause is still all wrong, as explained. There is no evidence to link this behaviour with HS, as yet, and you should be considering the possibility the problem lies elsewhere, especially since it could be code you committed that is at fault. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 12 May 2010 12:04 On Wed, May 12, 2010 at 11:28 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: > On Wed, 2010-05-12 at 14:18 +0100, Simon Riggs wrote: >> On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote: >> > On Wed, May 12, 2010 at 7:26 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: >> > > On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote: >> > > >> > >> I'm not sure what to make of this. Sometimes not shutting down >> > >> doesn't sound like a feature to me. >> > > >> > > It acts exactly the same in recovery as in normal running. It is not a >> > > special feature of recovery at all, bug or otherwise. >> > >> > Simon, that doesn't make any sense. We are talking about a backend >> > getting stuck forever on an exclusive lock that is held by the startup >> > process and which will never be released (for example, because the >> > master has shut down and no more WAL can be obtained for replay). The >> > startup process does not hold locks in normal operation. >> >> When I test it, startup process holding a lock does not prevent shutdown >> of a standby. >> >> I'd be happy to see your test case showing a bug exists and that the >> behaviour differs from normal running. > > Let me put this differently: I accept that Stefan has reported a > problem. Neither Tom nor myself can reproduce the problem. I've re-run > Stefan's test case and restarted the server more than 400 times now > without any issue. OK, I'm glad to hear you've been testing this. I wasn't aware of that. > I re-read your post where you gave what you yourself called "uninformed > speculation". There's no real polite way to say it, but yes your > speculation does appear to be uninformed, since it is incorrect. Reasons > would be not least that Stefan's tests don't actually send any locks to > the standby anyway (!), Hmm. Well, assuming you're correct, that does seem to be a, uh, slight problem with my theory. > but even if they did your speculation as to the > cause is still all wrong, as explained. You lost me. I don't understand why the problem that I'm referring to couldn't happen, even if it's not what's happening here. > There is no evidence to link this behaviour with HS, as yet, and you > should be considering the possibility the problem lies elsewhere, > especially since it could be code you committed that is at fault. Huh?? The evidence that this bug is linked with HS is that it occurs on a server running in HS mode, and not otherwise. As for whether the bug is code I committed, that's certainly possible, but keep in mind it didn't work at all before IN HOT STANDBY MODE - and that will be code you committed. I'm going to go test this and see if I can figure out what's going on. I hope you will keep at it also - as you point out, your knowledge of this code far exceeds mine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 12 May 2010 12:49 On Wed, 2010-05-12 at 12:04 -0400, Robert Haas wrote: > Huh?? The evidence that this bug is linked with HS is that it occurs > on a server running in HS mode, and not otherwise. As for whether the > bug is code I committed, that's certainly possible, but keep in mind > it didn't work at all before IN HOT STANDBY MODE - and that will be > code you committed. I'll say it now, so its plain. I'm not going to investigate every bug that occurs on Postgres, just because someone was in HS when they found it. Any more than all bugs on Postgres in normal running are MVCC bugs. There needs to be reasonable evidence or a conjecture by someone that knows something about the code. If HS were the only thing changed in recovery in this release, that might not seem reasonable, but since we have much new code and I am not the only developer, it is. Normal shutdown didn't work on a standby before HS was committed and it didn't work afterwards either. Use all the capitals you like but if you use poor arguments and combine that with no evidence then we'll not get very far, either in working together or in solving the actual bugs. Please don't continue to make wild speculations about things related to HS and recovery, so that issues do not become confused; there is no need to comment on every thread. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Stark on 12 May 2010 13:05
On Wed, May 12, 2010 at 5:49 PM, Simon Riggs <simon(a)2ndquadrant.com> wrote: > On Wed, 2010-05-12 at 12:04 -0400, Robert Haas wrote: > >> Huh?? The evidence that this bug is linked with HS is that it occurs >> on a server running in HS mode, and not otherwise. As for whether the >> bug is code I committed, that's certainly possible, but keep in mind >> it didn't work at all before IN HOT STANDBY MODE - and that will be >> code you committed. > > I'll say it now, so its plain. I'm not going to investigate every bug > that occurs on Postgres, just because someone was in HS when they found > it. Fair enough, though your help debugging is always appreciated regardless of whether a problem is HS related or not. Nobody's obligated to work on anything in Postgres after all. I'm not sure who to blame for the shouting match over whose commit introduced the bug -- it doesn't seem like a relevant or useful thing to argue about, please both stop. > there is no need > to comment on every thread. This is out of line. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |