Prev: [HACKERS] pg_stop_backup does not complete
Next: [PATCH] backend: compare word-at-a-time in bcTruelen
From: Josh Berkus on 25 Feb 2010 12:29 > In an ideal world it would be best if pg_stop_backup could actually > print the error status of the archiving command. Agreed. > And do we have closure on whether a "fast" shutdown is hanging? Or was > that actually a smart shutdown? No, I need to retest and verify 100% that the issue wasn't something other than stop_backup. > Perhaps "smart" shutdown needs to print out what it's waiting on > periodically as well, and suggest a fast shutdown to abort those > transactions. That would be a good thing to have for PostgreSQL in general. Given that any number of things can stop a smart shutdown, it's more than a little baffling to users why one hangs forever. BUT ... since most users run smart shutdown via a services script, output on what shutdown is waiting on would need to be written to the log rather than given interactively. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 25 Feb 2010 12:31 Greg Stark wrote: > In an ideal world it would be best if pg_stop_backup could actually > print the error status of the archiving command. Is there any way for > it to get ahold of the fact that the archiving is failing? > This is in the area I mentioned I'd proposed a patch to improve not too long ago. The archiver doesn't tell anyone anything about what it's doing right now, or even save its state information. I made a proposal for making the bit it's currently working on (or just finished, or both) visible not too long ago: http://archives.postgresql.org/message-id/4B4FEA18.5080705(a)2ndquadrant.com The main content for that was tracking disk space, which wandered into a separate discussion, but it would be easy enough to use the information that intends to export ("what archive file is currently being processed?") and print that in the error message too. Makes it easy enough for people to infer the command is failing if the same segment number shows up every time in that message. I didn't finish that only because the CF kicked off and I switched out of new development to review. Since this class of error keeps popping up, I could easily finish that patch off by next week and see if it helps here. I thought it was a long overdue bit of monitoring to add to the database anyway, just never had the time to work on it before. > And do we have closure on whether a "fast" shutdown is hanging? Or was > that actually a smart shutdown? > When I tested this myself, a smart shutdown hung every time, while a fast one blew right through the problem--matching what's described in the manual. Josh suggested at one point he might have seen a situation where fast shutdown wasn't sufficient to work around this and an immediate one was required. Certainly possible that happened for an as yet unknown reason--I've seen plenty of situations where fast shutdown didn't work--but I haven't been able to replicate it. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Bruce Momjian on 25 Feb 2010 12:47 Joshua D. Drake wrote: > On Wed, 2010-02-24 at 12:32 -0800, Josh Berkus wrote: > > > pg_stop_backup() doesn't complete until all the WAL segments needed to > > > restore from the backup are archived. If archive_command is failing, > > > that never happens. > > > > OK, so we need a way out of that cycle if the user is issuing > > pg_stop_backup because they *already know* that archive_command is > > failing. Right now, there's no way out other than a fast shutdown, > > which is a bit user-hostile. > > Hmmm well... changing the archive_command to /bin/true and issuing a HUP > would cause the command to succeed, but I still think that is over the > top. I prefer Kevin's solution or some variant thereof: > > http://archives.postgresql.org/pgsql-hackers/2010-02/msg01853.php > http://archives.postgresql.org/pgsql-hackers/2010-02/msg01907.php Postgres 9.0 will be the first release to mention /bin/true as a way of turning off archiving in extraordinary circumstances: http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html -- Bruce Momjian <bruce(a)momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Bruce Momjian on 25 Feb 2010 13:10 Looks like we arrived at the best solution here. I don't think it was clear to users that pg_stop_backup() was issuing an archive_command and hence they wouldn't be likely to understand the delay or correct a problem. This gives them the information they need at the time they need it. --------------------------------------------------------------------------- Tom Lane wrote: > Greg Smith <greg(a)2ndquadrant.com> writes: > > Tom Lane wrote: > >> The value of the HINT I think would be to make them (a) not afraid to > >> hit control-C and (b) aware of the fact that their archiver has got > >> a problem. > >> > > Agreed on both points. Patch attached that implements something similar > > to Josh's wording, tweaking the original warning too. > > OK, everyone likes the immediate NOTICE. I did a bit of copy-editing > and committed the attached version. > > regards, tom lane > > Index: xlog.c > =================================================================== > RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v > retrieving revision 1.377 > diff -c -r1.377 xlog.c > *** xlog.c 19 Feb 2010 10:51:03 -0000 1.377 > --- xlog.c 25 Feb 2010 02:15:49 -0000 > *************** > *** 8132,8138 **** > * > * We wait forever, since archive_command is supposed to work and we > * assume the admin wanted his backup to work completely. If you don't > ! * wish to wait, you can set statement_timeout. > */ > XLByteToPrevSeg(stoppoint, _logId, _logSeg); > XLogFileName(lastxlogfilename, ThisTimeLineID, _logId, _logSeg); > --- 8132,8139 ---- > * > * We wait forever, since archive_command is supposed to work and we > * assume the admin wanted his backup to work completely. If you don't > ! * wish to wait, you can set statement_timeout. Also, some notices > ! * are issued to clue in anyone who might be doing this interactively. > */ > XLByteToPrevSeg(stoppoint, _logId, _logSeg); > XLogFileName(lastxlogfilename, ThisTimeLineID, _logId, _logSeg); > *************** > *** 8141,8146 **** > --- 8142,8150 ---- > BackupHistoryFileName(histfilename, ThisTimeLineID, _logId, _logSeg, > startpoint.xrecoff % XLogSegSize); > > + ereport(NOTICE, > + (errmsg("pg_stop_backup cleanup done, waiting for required WAL segments to be archived"))); > + > seconds_before_warning = 60; > waits = 0; > > *************** > *** 8155,8162 **** > { > seconds_before_warning *= 2; /* This wraps in >10 years... */ > ereport(WARNING, > ! (errmsg("pg_stop_backup still waiting for archive to complete (%d seconds elapsed)", > ! waits))); > } > } > > --- 8159,8169 ---- > { > seconds_before_warning *= 2; /* This wraps in >10 years... */ > ereport(WARNING, > ! (errmsg("pg_stop_backup still waiting for all required WAL segments to be archived (%d seconds elapsed)", > ! waits), > ! errhint("Check that your archive_command is executing properly. " > ! "pg_stop_backup can be cancelled safely, " > ! "but the database backup will not be usable without all the WAL segments."))); > } > } > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce(a)momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Bernd Helmle on 26 Feb 2010 04:41
--On 24. Februar 2010 16:01:02 -0500 Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > One objection to this is that it's not very clear to the user when > pg_stop_backup has finished with actual work and is just waiting for the > archiver, ie when is it safe to hit control-C? Maybe we should emit a > "backup done, waiting for archiver to complete" notice before entering > the sleep loop. +1 for this. This hint would certainly help to recognize the issue immediately (or at least point to a possible cause). -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |