Streaming replication status [PgSql]

Prev: [HACKERS] synchronized snapshots
Next: synchronized snapshots

From: Josh Berkus on 8 Jan 2010 17:20

On 1/8/10 1:16 PM, Heikki Linnakangas wrote:
> * A standby that connects to master, initiates streaming, and then sits
> idle without stalls recycling of old WAL files in the master. That will
> eventually lead to a full disk in master. Do we need some kind of a
> emergency valve on that?

WARNING: I haven't thought about how this would work together with HS yes.

I think this needs to be administrator-configurable.

I'd suggest a GUC approach:

archiving_lag_action = { ignore, shutdown, stop }

"Ignore" would be the default. Some users would rather have the master
shut down if the slave has stopped taking segments; that's "shutdown".
Otherwise, it's "stop" which simply stops archiving and starts recylcing
when we reach that number of segments.

Better name for the GUC very welcome ...

--Josh Berkus

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on 8 Jan 2010 20:38

On Fri, Jan 8, 2010 at 9:16 PM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:

> * We still have a related issue, though: if standby is configured to
> archive to the same location as master (as it always is on my laptop,
> where I use the postgresql.conf of the master unmodified in the server),
> right after failover the standby server will try to archive all the old
> WAL files that were streamed from the master; but they exist already in
> the archive, as the master archived them already. I'm not sure if this
> is a pilot error, or if we should do something in the server to tell
> apart WAL segments streamed from master and those generated in the
> standby server after failover. Maybe we should immediately create a
> .done file for every file received from master?

How do we know the master has finished archiving them? If the master
crashes suddenly and you fail over couldn't it have failed to archive
segments that have been received by the standby via streaming
replication?

> * Need to add comments somewhere to note that ReadRecord depends on the
> fact that a WAL record is always send as whole, never split across two
> messages.

What happens in the case of the very large records Tom was describing
recently. If the entire record doesn't fit in a WAL segment is it the
whole record or the partial record with the continuation bit that
needs to fit in a message?

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on 9 Jan 2010 01:53

On Sat, Jan 9, 2010 at 10:38 AM, Greg Stark <gsstark(a)mit.edu> wrote:
>> * Need to add comments somewhere to note that ReadRecord depends on the
>> fact that a WAL record is always send as whole, never split across two
>> messages.
>
> What happens in the case of the very large records Tom was describing
> recently. If the entire record doesn't fit in a WAL segment is it the
> whole record or the partial record with the continuation bit that
> needs to fit in a message?

It's the partial record with the continuation bit.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 10 Jan 2010 06:13

On Fri, 2010-01-08 at 23:16 +0200, Heikki Linnakangas wrote:

> * I removed the feature that archiver was started during recovery. The
> idea of that was to enable archiving from a standby server, to relieve
> the master server of that duty, but I found it annoying because it
> causes trouble if the standby and master are configured to archive to
> the same location; they will fight over which copies the file to the
> archive first. Frankly the feature doesn't seem very useful as the patch
> stands, because you still have to configure archiving in the master in
> practice; you can't take an online base backup otherwise, and you have
> the risk of standby falling too much behind and having to restore from
> base backup whenever the standby is disconnected for any reason. Let's
> revisit this later when it's truly useful.

Agreed

> * We still have a related issue, though: if standby is configured to
> archive to the same location as master (as it always is on my laptop,
> where I use the postgresql.conf of the master unmodified in the server),
> right after failover the standby server will try to archive all the old
> WAL files that were streamed from the master; but they exist already in
> the archive, as the master archived them already. I'm not sure if this
> is a pilot error, or if we should do something in the server to tell
> apart WAL segments streamed from master and those generated in the
> standby server after failover. Maybe we should immediately create a
> .done file for every file received from master?

That sounds like the right thing to do.

> * I don't think we should require superuser rights for replication.
> Although you see all WAL and potentially all data in the system through
> that, a standby doesn't need any write access to the master, so it would
> be good practice to create a dedicated account with limited privileges
> for replication.

Agreed. I think we should have a predefined user, called "replication"
that has only the correct rights.

> * A standby that connects to master, initiates streaming, and then sits
> idle without stalls recycling of old WAL files in the master. That will
> eventually lead to a full disk in master. Do we need some kind of a
> emergency valve on that?

Can you explain how this could occur? My understanding was that the
walreceiver and startup processes were capable of independent action
specifically to avoid for this kind of effect.

> * Documentation. The patch used to move around some sections, but I
> think that has been partially reverted so that it now just duplicates
> them. It probably needs other work too, I haven't looked at the docs in
> any detail.

I believe the docs need urgent attention. We need more people to read
the docs and understand the implications so that people can then
comment. It is extremely non-obvious from the patch how things work at a
behaviour level.

I am very concerned that there is no thought given to monitoring
replication. This will make the feature difficult to use in practice.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 10 Jan 2010 06:17

On Fri, 2010-01-08 at 14:20 -0800, Josh Berkus wrote:
> On 1/8/10 1:16 PM, Heikki Linnakangas wrote:
> > * A standby that connects to master, initiates streaming, and then sits
> > idle without stalls recycling of old WAL files in the master. That will
> > eventually lead to a full disk in master. Do we need some kind of a
> > emergency valve on that?
>
> WARNING: I haven't thought about how this would work together with HS yes.

I've been reviewing things as we go along, so I'm not that tense
overall. Having said that I don't understand why the problem above would
occur and the sentence seems to be missing a verb between "without" and
"stalls". More explanation please.

What could happen is that the standby could slowly lag behind master. We
don't have any way of monitoring that, as yet. Setting ps display is not
enough here.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: [HACKERS] synchronized snapshots
Next: synchronized snapshots