Streaming Replication: Checkpoint_segment and wal_keep

Prev: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby
Next: [HACKERS] 9.0beta2 release plans

From: Fujii Masao on 27 May 2010 10:09

On Thu, May 27, 2010 at 10:13 PM, Sander, Ingo (NSN - DE/Munich)
<ingo.sander(a)nsn.com> wrote:
>
> With the parameter checkpoint_segment and wal_keep_segments the max. number
> of wal segments are set. If now the max number is reached,
>
> (1) the segments are deleted/recycled
> or (2) if the time set by the checkpoint_timeout is over, a checkpoint is
> set and if possible a deletion/recycling is done.
>
> This is the mechanism on the active side of a db server. On the standby side
> however only unused tranferred segments will be deleted if the
> checkpoint_timeout mechanism (2) is executed.
>
> Is this a correct behaviour or it is an error?
>
> I have observed (checkpoint_segment set to 3; wal_keep_segments set to 10
> and checkpoint_timeout set to 30min) that in my stress test the disk usage
> on standby side is increased up to 2GB with xlog segments whereby on the
> active side only ~60MB xlog files are available (we have patched the xlog
> file size to 4MB). To prevent this one possibility is to decreace the
> checkpoint_timeout to a low value (30sec), however this had the disadvantage
> that a checkpoint is often executed on active side which can influence the
> performance. Another possibility is to have different postgresql.conf on
> active and on standby side, but this is not our preferred solution.

I guess this happens because the frequency of checkpoint on the standby is
too lower than that on the master. In the master, checkpoint occurs for every
consumption of three segments because of "checkpoint_segments = 3". On the
other hand, in the standby, only checkpoint_timeout has effect, so checkpoint
occurs for every 30 minutes because of "checkpoint_timeout = 30min".

The walreceiver should signal the bgwriter to start checkpoint if it has
received more than checkpoint_segments WAL files, like normal processing?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 27 May 2010 10:13

On Thu, May 27, 2010 at 10:09 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
> On Thu, May 27, 2010 at 10:13 PM, Sander, Ingo (NSN - DE/Munich)
> <ingo.sander(a)nsn.com> wrote:
>>
>> With the parameter checkpoint_segment and wal_keep_segments the max. number
>> of wal segments are set. If now the max number is reached,
>>
>> (1) the segments are deleted/recycled
>> or (2) if the time set by the checkpoint_timeout is over, a checkpoint is
>> set and if possible a deletion/recycling is done.
>>
>> This is the mechanism on the active side of a db server. On the standby side
>> however only unused tranferred segments will be deleted if the
>> checkpoint_timeout mechanism (2) is executed.
>>
>> Is this a correct behaviour or it is an error?
>>
>> I have observed (checkpoint_segment set to 3; wal_keep_segments set to 10
>> and checkpoint_timeout set to 30min) that in my stress test the disk usage
>> on standby side is increased up to 2GB with xlog segments whereby on the
>> active side only ~60MB xlog files are available (we have patched the xlog
>> file size to 4MB). To prevent this one possibility is to decreace the
>> checkpoint_timeout to a low value (30sec), however this had the disadvantage
>> that a checkpoint is often executed on active side which can influence the
>> performance. Another possibility is to have different postgresql.conf on
>> active and on standby side, but this is not our preferred solution.
>
> I guess this happens because the frequency of checkpoint on the standby is
> too lower than that on the master. In the master, checkpoint occurs for every
> consumption of three segments because of "checkpoint_segments = 3". On the
> other hand, in the standby, only checkpoint_timeout has effect, so checkpoint
> occurs for every 30 minutes because of "checkpoint_timeout = 30min".
>
> The walreceiver should signal the bgwriter to start checkpoint if it has
> received more than checkpoint_segments WAL files, like normal processing?

Is this also an issue when using log shipping, or just with SR?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on 29 May 2010 23:04

On Fri, May 28, 2010 at 11:12 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
> On Thu, May 27, 2010 at 11:13 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>>> I guess this happens because the frequency of checkpoint on the standby is
>>> too lower than that on the master. In the master, checkpoint occurs for every
>>> consumption of three segments because of "checkpoint_segments = 3". On the
>>> other hand, in the standby, only checkpoint_timeout has effect, so checkpoint
>>> occurs for every 30 minutes because of "checkpoint_timeout = 30min".
>>>
>>> The walreceiver should signal the bgwriter to start checkpoint if it has
>>> received more than checkpoint_segments WAL files, like normal processing?
>>
>> Is this also an issue when using log shipping, or just with SR?
>
> When using log shipping, checkpoint_segments always doesn't trigger a
> checkpoint. So recovery after the standby crashes might take unexpectedly
> long since redo starting point might be old.
>
> But in file-based log shipping, since WAL files don't accumulate in
> pg_xlog directory on the standby, even if the frequency of checkpoint
> is very low, pg_xlog will not be filled with many WAL files. That
> accumulation occurs only when using SR.
>
> If we should avoid low frequency of checkpoint itself rather than
> accumulation of WAL files, the bgwriter instead of the walreceiver
> should check if we've consumed too much WAL, I think. Thought?

I attached the patch, which changes the startup process so that it signals
bgwriter to perform a restartpoint if we've already replayed too much WAL
files. This leads checkpoint_segments to trigger a restartpoint.

This patch is worth applying for 9.0? If not, I'll add it into the next CF.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From: Fujii Masao on 31 May 2010 06:17

On Mon, May 31, 2010 at 6:37 PM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> The central question is whether checkpoint_segments should trigger
> restartpoints or not. When PITR and restartpoints were introduced, the
> answer was "no", on the grounds that when you're doing recovery you're
> presumably replaying the logs much faster than they were generated, and you
> don't want to slow down the recovery by checkpointing too often.

Right.

> Now that we have bgwriter active during recovery, and streaming replication
> which retains the streamed WALs so that we now risk running out of disk
> space with long checkpoint_timeout, it's time to reconsider that.
>
> I think we have three options:
>
> 1) Leave it as it is, checkpoint_segments doesn't do anything during
> recovery/standby mode
>
> 2) Change it so that checkpoint_segments does take effect during
> recover/standby
>
> 3) Change it so that checkpoint_segments takes effect during streaming
> replication, but not during recovery otherwise
>
> I'm leaning towards 3), it still seems reasonable to not slow down recovery
> when recovering from archive, but the potential for out of disk space
> warrants doing 3.

3) makes sense. But how about 4)?

4) Change it so that checkpoint_segments takes effect in standby mode,
but not during recovery otherwise

This would lessen the time required to restart the standby also in
file-based log shipping case. Of course, there is the tradeoff
between the speed of recovery and the recovery time.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 31 May 2010 11:14

Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes:
> The central question is whether checkpoint_segments should trigger
> restartpoints or not. When PITR and restartpoints were introduced, the
> answer was "no", on the grounds that when you're doing recovery you're
> presumably replaying the logs much faster than they were generated, and
> you don't want to slow down the recovery by checkpointing too often.

> Now that we have bgwriter active during recovery, and streaming
> replication which retains the streamed WALs so that we now risk running
> out of disk space with long checkpoint_timeout, it's time to reconsider
> that.

> I think we have three options:

What about

(4) pay some attention to the actual elapsed time since the last
restart point?

All the others seem like kluges that are relying on hard-wired rules
that are hoped to achieve something like a time-based checkpoint.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2 3
Prev: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby
Next: [HACKERS] 9.0beta2 release plans