Prev: [HACKERS] pg_stop_backup does not complete
Next: [PATCH] backend: compare word-at-a-time in bcTruelen
From: Josh Berkus on 23 Feb 2010 12:49 > This issue is 100% reproduceable. Oh, btw, this is on Alpha4. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Joshua D. Drake" on 23 Feb 2010 12:52 On Tue, 2010-02-23 at 09:45 -0800, Josh Berkus wrote: > Simon, Fujii, All: > > While demoing HS/SR at SCALE, I ran into a problem which is likely to be > a commonly encountered bug when people first setup HS/SR. Here's the > sequence: > > 1) Set up a brand new master with an archive-commmand and archive=on. > > 2) Start the master > > 3) Do a pg_start_backup() > > 4) Realize, based on log error messages, that I've misconfigured the > archive_command. > > 5) Attempt to shut down the master. Master tells me that pg_stop_backup > must be run in order to shut down. If I issue a shutdown, PostgreSQL should do whatever it needs to do to shutdown; including issuing a pg_stop_backup. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 23 Feb 2010 12:56 "Joshua D. Drake" <jd(a)commandprompt.com> wrote: > If I issue a shutdown, PostgreSQL should do whatever it needs to > do to shutdown; including issuing a pg_stop_backup. Should we have a pg_fail_backup function, so that it doesn't put out a file which suggests that we have a complete backup? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 23 Feb 2010 13:58 On Tue, 2010-02-23 at 09:45 -0800, Josh Berkus wrote: > 1) Set up a brand new master with an archive-commmand and archive=on. > > 2) Start the master > > 3) Do a pg_start_backup() > > 4) Realize, based on log error messages, that I've misconfigured the > archive_command. > 5) Attempt to shut down the master. Master tells me that pg_stop_backup > must be run in order to shut down. > > 6) Execute pg_stop_backup. > > 7) pg_stop_backup waits forever without ever stopping backup. Ever 60 > seconds, it give me a helpful "still waiting" message, but at least in > the amount of time I was willing to wait (5 minutes), it never completed. > > 8) do an immediate shutdown, as it's the only way I can get the database > unstuck. > > With some experimentation, the problem seems to occur when you have a > failing archive_command and a master which currently has no database > traffic; for example, if I did some database write activity (a createdb) > then pg_stop_backup would complete after about 60 seconds (which, btw, > is extremely annoying, but at least tolerable). > > This issue is 100% reproduceable. IMHO there in no problem in that behaviour. If somebody requests a backup then we should wait for it to complete. Kevin's suggestion of pg_fail_backup() is the only sensible conclusion there because it gives an explicit way out of deadlock. ISTM the problem is that you didn't test. Steps 3 and 4 should have been reversed. Perhaps we should put something in the docs to say "and test". The correct resolution is to put in an archive_command that works. We can put in an extra step to prevent a pg_start_backup() if there are a significant number of outstanding files to be archived. Doing that seems like closing the door after the horse has bolted, since we just introduced streaming replication that doesn't rely on archived files. In any case, I don't see many people working on a production system hitting a problem on an archive_command and then deciding to shut down. So I don't see this as something that needs fixing for 9.0. There is already too much non-essential code there, all of which needs to be tested. I don't think adding in new corner cases to "help" people makes any sense until we have automated testing that allows us to rerun the regression tests to check all this stuff still works. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Joshua D. Drake" on 23 Feb 2010 14:24
On Tue, 2010-02-23 at 18:58 +0000, Simon Riggs wrote: > On Tue, 2010-02-23 at 09:45 -0800, Josh Berkus wrote: > > This issue is 100% reproduceable. > > IMHO there in no problem in that behaviour. If somebody requests a > backup then we should wait for it to complete. Kevin's suggestion of > pg_fail_backup() is the only sensible conclusion there because it gives > an explicit way out of deadlock. > > ISTM the problem is that you didn't test. Steps 3 and 4 should have been > reversed. Perhaps we should put something in the docs to say "and test". > The correct resolution is to put in an archive_command that works. The problem isn't that it is a bad archive_command, it is that PostgreSQL has no way to deal with this gracefully. Yes people should test but are we dealing with the real world or not? > > So I don't see this as something that needs fixing for 9.0. There is > already too much non-essential code there, all of which needs to be > tested. I don't think adding in new corner cases to "help" people makes > any sense until we have automated testing that allows us to rerun the > regression tests to check all this stuff still works. This will bite us if we release like this. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |