From: Josh Berkus on

> In an ideal world it would be best if pg_stop_backup could actually
> print the error status of the archiving command.

Agreed.

> And do we have closure on whether a "fast" shutdown is hanging? Or was
> that actually a smart shutdown?

No, I need to retest and verify 100% that the issue wasn't something
other than stop_backup.

> Perhaps "smart" shutdown needs to print out what it's waiting on
> periodically as well, and suggest a fast shutdown to abort those
> transactions.

That would be a good thing to have for PostgreSQL in general. Given
that any number of things can stop a smart shutdown, it's more than a
little baffling to users why one hangs forever.

BUT ... since most users run smart shutdown via a services script,
output on what shutdown is waiting on would need to be written to the
log rather than given interactively.

--Josh Berkus

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on
Greg Stark wrote:
> In an ideal world it would be best if pg_stop_backup could actually
> print the error status of the archiving command. Is there any way for
> it to get ahold of the fact that the archiving is failing?
>

This is in the area I mentioned I'd proposed a patch to improve not too
long ago. The archiver doesn't tell anyone anything about what it's
doing right now, or even save its state information. I made a proposal
for making the bit it's currently working on (or just finished, or both)
visible not too long ago:
http://archives.postgresql.org/message-id/4B4FEA18.5080705(a)2ndquadrant.com

The main content for that was tracking disk space, which wandered into a
separate discussion, but it would be easy enough to use the information
that intends to export ("what archive file is currently being
processed?") and print that in the error message too. Makes it easy
enough for people to infer the command is failing if the same segment
number shows up every time in that message.

I didn't finish that only because the CF kicked off and I switched out
of new development to review. Since this class of error keeps popping
up, I could easily finish that patch off by next week and see if it
helps here. I thought it was a long overdue bit of monitoring to add to
the database anyway, just never had the time to work on it before.

> And do we have closure on whether a "fast" shutdown is hanging? Or was
> that actually a smart shutdown?
>

When I tested this myself, a smart shutdown hung every time, while a
fast one blew right through the problem--matching what's described in
the manual. Josh suggested at one point he might have seen a situation
where fast shutdown wasn't sufficient to work around this and an
immediate one was required. Certainly possible that happened for an as
yet unknown reason--I've seen plenty of situations where fast shutdown
didn't work--but I haven't been able to replicate it.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Bruce Momjian on
Joshua D. Drake wrote:
> On Wed, 2010-02-24 at 12:32 -0800, Josh Berkus wrote:
> > > pg_stop_backup() doesn't complete until all the WAL segments needed to
> > > restore from the backup are archived. If archive_command is failing,
> > > that never happens.
> >
> > OK, so we need a way out of that cycle if the user is issuing
> > pg_stop_backup because they *already know* that archive_command is
> > failing. Right now, there's no way out other than a fast shutdown,
> > which is a bit user-hostile.
>
> Hmmm well... changing the archive_command to /bin/true and issuing a HUP
> would cause the command to succeed, but I still think that is over the
> top. I prefer Kevin's solution or some variant thereof:
>
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg01853.php
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg01907.php

Postgres 9.0 will be the first release to mention /bin/true as a way of
turning off archiving in extraordinary circumstances:

http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Bruce Momjian on

Looks like we arrived at the best solution here. I don't think it was
clear to users that pg_stop_backup() was issuing an archive_command and
hence they wouldn't be likely to understand the delay or correct a
problem. This gives them the information they need at the time they
need it.

---------------------------------------------------------------------------

Tom Lane wrote:
> Greg Smith <greg(a)2ndquadrant.com> writes:
> > Tom Lane wrote:
> >> The value of the HINT I think would be to make them (a) not afraid to
> >> hit control-C and (b) aware of the fact that their archiver has got
> >> a problem.
> >>
> > Agreed on both points. Patch attached that implements something similar
> > to Josh's wording, tweaking the original warning too.
>
> OK, everyone likes the immediate NOTICE. I did a bit of copy-editing
> and committed the attached version.
>
> regards, tom lane
>
> Index: xlog.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
> retrieving revision 1.377
> diff -c -r1.377 xlog.c
> *** xlog.c 19 Feb 2010 10:51:03 -0000 1.377
> --- xlog.c 25 Feb 2010 02:15:49 -0000
> ***************
> *** 8132,8138 ****
> *
> * We wait forever, since archive_command is supposed to work and we
> * assume the admin wanted his backup to work completely. If you don't
> ! * wish to wait, you can set statement_timeout.
> */
> XLByteToPrevSeg(stoppoint, _logId, _logSeg);
> XLogFileName(lastxlogfilename, ThisTimeLineID, _logId, _logSeg);
> --- 8132,8139 ----
> *
> * We wait forever, since archive_command is supposed to work and we
> * assume the admin wanted his backup to work completely. If you don't
> ! * wish to wait, you can set statement_timeout. Also, some notices
> ! * are issued to clue in anyone who might be doing this interactively.
> */
> XLByteToPrevSeg(stoppoint, _logId, _logSeg);
> XLogFileName(lastxlogfilename, ThisTimeLineID, _logId, _logSeg);
> ***************
> *** 8141,8146 ****
> --- 8142,8150 ----
> BackupHistoryFileName(histfilename, ThisTimeLineID, _logId, _logSeg,
> startpoint.xrecoff % XLogSegSize);
>
> + ereport(NOTICE,
> + (errmsg("pg_stop_backup cleanup done, waiting for required WAL segments to be archived")));
> +
> seconds_before_warning = 60;
> waits = 0;
>
> ***************
> *** 8155,8162 ****
> {
> seconds_before_warning *= 2; /* This wraps in >10 years... */
> ereport(WARNING,
> ! (errmsg("pg_stop_backup still waiting for archive to complete (%d seconds elapsed)",
> ! waits)));
> }
> }
>
> --- 8159,8169 ----
> {
> seconds_before_warning *= 2; /* This wraps in >10 years... */
> ereport(WARNING,
> ! (errmsg("pg_stop_backup still waiting for all required WAL segments to be archived (%d seconds elapsed)",
> ! waits),
> ! errhint("Check that your archive_command is executing properly. "
> ! "pg_stop_backup can be cancelled safely, "
> ! "but the database backup will not be usable without all the WAL segments.")));
> }
> }
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Bernd Helmle on


--On 24. Februar 2010 16:01:02 -0500 Tom Lane <tgl(a)sss.pgh.pa.us> wrote:

> One objection to this is that it's not very clear to the user when
> pg_stop_backup has finished with actual work and is just waiting for the
> archiver, ie when is it safe to hit control-C? Maybe we should emit a
> "backup done, waiting for archiver to complete" notice before entering
> the sleep loop.

+1 for this. This hint would certainly help to recognize the issue
immediately (or at least point to a possible cause).

--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers