From: Simon Riggs on
On Sat, 2010-05-15 at 20:05 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Sat, 2010-05-15 at 19:30 +0300, Heikki Linnakangas wrote:
> >> Doesn't feel right to me either. If you want to expose the
> >> keepalive-time to queries, it should be a separate field, something like
> >> lastMasterKeepaliveTime and a pg_last_master_keepalive() function to
> >> read it.
> >
> > That wouldn't be good because then you couldn't easily monitor the
> > delay? You'd have to run two different functions depending on the state
> > of replication (for which we would need yet another function). Users
> > would just wrap that back up into a single function.
>
> What exactly is the user trying to monitor? If it's "how far behind is
> the standby", the difference between pg_current_xlog_insert_location()
> in the master and pg_last_xlog_replay_location() in the standby seems
> more robust and well-defined to me. It's a measure of XLOG location (ie.
> bytes) instead of time, but time is a complicated concept.

Maybe, but its meaningful to users and that is the point.

> Also note that as the patch stands, if you receive a keep-alive from the
> master at point X, it doesn't mean that the standby is fully up-to-date.
> It's possible that the walsender just finished sending a huge batch of
> accumulated WAL, say 1 GB, and it took 1 hour for the batch to be sent.
> During that time, a lot more WAL has accumulated, yet walsender sends a
> keep-alive with the current timestamp.

Not at all. The timestamp for the keepalive is calculated after the
pq_flush for the main WAL data. So it takes 10 years to send the next
blob of WAL data the timestamp will be current.

However, a point you made in an earlier thread is still true here. It
sounds like it would be best if startup process didn't re-access shared
memory for the next location until it had fully replayed all the WAL up
to the point it last accessed. That way the changing value of the shared
timestamp would have no effect on the calculated value at any point in
time. I will recode using that concept.

> In general, the purpose of a keep-alive is to keep the connection alive,
> but you're trying to accomplish something else too, and I don't fully
> understand what it is.

That surprises me. If nothing else, its in the title of the thread,
though since you personally added this to the Hot Standby todo more than
6 months ago I'd hope you of all people would understand the purpose.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Sat, 2010-05-15 at 18:24 +0100, Simon Riggs wrote:

> I will recode using that concept.

There's some behaviours that aren't helpful here:

Startup gets new pointer when it runs out of data to replay. That might
or might not include an updated keepalive timestamp, since there's no
exact relationship between chunks sent and chunks received. Startup
might ask for a new chunk when half a chunk has been received, or when
multiple chunks have been received.

WALSender doesn't chunk up what it sends, though libpq does at a lower
level. So there's no way to make a keepalive happen every X bytes
without doing this from within libpq.

WALSender sleeps even when it might have more WAL to send, it doesn't
check it just unconditionally sleeps. At least WALReceiver loops until
it has no more to receive. I just can't imagine why that's useful
behaviour.

All in all, I think we should be calling this "burst replication" not
streaming replication. That does cause an issue in trying to monitor
what's going on cos there's so much jitter.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on
Simon Riggs wrote:
> WALSender sleeps even when it might have more WAL to send, it doesn't
> check it just unconditionally sleeps. At least WALReceiver loops until
> it has no more to receive. I just can't imagine why that's useful
> behaviour.

Good catch. That should be fixed.

I also note that walsender doesn't respond to signals, while it's
sending a large batch. That's analogous to the issue that was addressed
recently in the archiver process.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on
Heikki Linnakangas wrote:
> Simon Riggs wrote:
>> WALSender sleeps even when it might have more WAL to send, it doesn't
>> check it just unconditionally sleeps. At least WALReceiver loops until
>> it has no more to receive. I just can't imagine why that's useful
>> behaviour.
>
> Good catch. That should be fixed.
>
> I also note that walsender doesn't respond to signals, while it's
> sending a large batch. That's analogous to the issue that was addressed
> recently in the archiver process.

Attached patch rearranges the walsender loops slightly to fix the above.
XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE /
2) in one round and returns to the main loop after that even if there's
unsent WAL, and the main loop no longer sleeps if there's unsent WAL.
That way the main loop gets to respond to signals quickly, and we also
get to update the shared memory status and PS display more often when
there's a lot of catching up to do.

Comments, have I screwed up anything?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From: Fujii Masao on
On Mon, May 17, 2010 at 1:11 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
> On Sat, 2010-05-15 at 19:50 +0100, Simon Riggs wrote:
>> On Sat, 2010-05-15 at 18:24 +0100, Simon Riggs wrote:
>>
>> > I will recode using that concept.
>
>> Startup gets new pointer when it runs out of data to replay. That might
>> or might not include an updated keepalive timestamp, since there's no
>> exact relationship between chunks sent and chunks received. Startup
>> might ask for a new chunk when half a chunk has been received, or when
>> multiple chunks have been received.
>
> New version, with some other cleanup of wait processing.
>
> New logic is that when Startup asks for next applychunk of WAL it saves
> the lastChunkTimestamp. That is then the base time used by
> WaitExceedsMaxStandbyDelay(), except when latestXLogTime is later.
> Since multiple receivechunks can arrive from primary before Startup asks
> for next applychunk we use the oldest receivechunk timestamp, not the
> latest. Doing it this way means the lastChunkTimestamp doesn't change
> when new keepalives arrive, so we have a stable and reasonably accurate
> recordSendTimestamp for each WAL record.
>
> The keepalive is sent as the first part of a new message, if any. So
> partial chunks of data always have an accurate timestamp, even if that
> is slightly older as a result. Doesn't make much difference except with
> very large chunks.
>
> I think that addresses the points raised on this thread and others.

Is it OK that this keepalive message cannot be used by HS in file-based
log-shipping? Even in SR, the startup process cannot use the keepalive
until walreceiver has been started up.

WalSndKeepAlive() always calls initStringInfo(), which seems to cause
memory-leak.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers