Wierd quirk of HS/SR, probably not fixable [PgSql]

Prev: pg_migrator
Next: [HACKERS] CP949 for EUC-KR?

From: Heikki Linnakangas on 27 Apr 2010 03:19

Josh Berkus wrote:
> Here's a way to trap yourself:
>
> (1) Set up an HS/SR master
> (2) pg_start_backup on the master
> (3) clone the master to 1 or more slaves
> (4) Fast shutdown the master (without pg_stop_backup)
> (5) Restart the master
> (6) Bring up the slaves
>
> Result: the slaves will come up fine in recovery mode. However, they
> will never switch over to HS mode or start SR. You will not be able to
> pg_stop_backup() on the master. At this point, you have no option but
> to shut down the slaves and re-clone.
>
> The only reason why this is somewhat problematic for users is that you
> will not get any messages from the master or the slaves to indicate why
> they won't switch modes. So I can imagine someone wasting a lot of time
> troubleshooting the wrong problems.
>
> Suggested resolution: I don't think there's and logical "fix" for this
> case; it should just be added to the docs as a failure/troubleshooting
> condition.

Hmm, we could throw an error in the standby, when we see a shutdown
checkpoint while we're waiting for an end-backup record. If the database
was shut down before pg_stop_backup(), we know that the backup was
cancelled and the end-backup record we're waiting for will never arrive.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on 27 Apr 2010 04:11

On Tue, Apr 27, 2010 at 4:19 PM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> Hmm, we could throw an error in the standby, when we see a shutdown
> checkpoint while we're waiting for an end-backup record. If the database
> was shut down before pg_stop_backup(), we know that the backup was
> cancelled and the end-backup record we're waiting for will never arrive.

Sounds good. This would work fine even if an immediate shutdown is done
instead since the primary ends up generating a shutdown checkpoint record
when restarting.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 27 Apr 2010 05:25

Fujii Masao wrote:
> On Tue, Apr 27, 2010 at 4:19 PM, Heikki Linnakangas
> <heikki.linnakangas(a)enterprisedb.com> wrote:
>> Hmm, we could throw an error in the standby, when we see a shutdown
>> checkpoint while we're waiting for an end-backup record. If the database
>> was shut down before pg_stop_backup(), we know that the backup was
>> cancelled and the end-backup record we're waiting for will never arrive.
>
> Sounds good. This would work fine even if an immediate shutdown is done
> instead since the primary ends up generating a shutdown checkpoint record
> when restarting.

Yep. I've committed a patch to do that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 27 Apr 2010 07:07

On Tue, Apr 27, 2010 at 5:25 AM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> Yep. I've committed a patch to do that.

Is there no way for the slave to recover from this situation?

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on 27 Apr 2010 07:35

On Tue, Apr 27, 2010 at 8:07 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> On Tue, Apr 27, 2010 at 5:25 AM, Heikki Linnakangas
> <heikki.linnakangas(a)enterprisedb.com> wrote:
>> Yep. I've committed a patch to do that.
>
> Is there no way for the slave to recover from this situation?

Probably Yes. You would need to take a fresh base backup and
restart the slave from it.

On second thought, seeing a shutdown checkpoint during waiting
end-backup means mostly that the database has already reached
the consistent state. We might be able to relax the error check.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2 3 4
Prev: pg_migrator
Next: [HACKERS] CP949 for EUC-KR?