Prev: [HACKERS] Keepalive for max_standby_delay
Next: Unexpected page allocation behavior on insert-only tables
From: Simon Riggs on 15 May 2010 13:24 On Sat, 2010-05-15 at 20:05 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Sat, 2010-05-15 at 19:30 +0300, Heikki Linnakangas wrote: > >> Doesn't feel right to me either. If you want to expose the > >> keepalive-time to queries, it should be a separate field, something like > >> lastMasterKeepaliveTime and a pg_last_master_keepalive() function to > >> read it. > > > > That wouldn't be good because then you couldn't easily monitor the > > delay? You'd have to run two different functions depending on the state > > of replication (for which we would need yet another function). Users > > would just wrap that back up into a single function. > > What exactly is the user trying to monitor? If it's "how far behind is > the standby", the difference between pg_current_xlog_insert_location() > in the master and pg_last_xlog_replay_location() in the standby seems > more robust and well-defined to me. It's a measure of XLOG location (ie. > bytes) instead of time, but time is a complicated concept. Maybe, but its meaningful to users and that is the point. > Also note that as the patch stands, if you receive a keep-alive from the > master at point X, it doesn't mean that the standby is fully up-to-date. > It's possible that the walsender just finished sending a huge batch of > accumulated WAL, say 1 GB, and it took 1 hour for the batch to be sent. > During that time, a lot more WAL has accumulated, yet walsender sends a > keep-alive with the current timestamp. Not at all. The timestamp for the keepalive is calculated after the pq_flush for the main WAL data. So it takes 10 years to send the next blob of WAL data the timestamp will be current. However, a point you made in an earlier thread is still true here. It sounds like it would be best if startup process didn't re-access shared memory for the next location until it had fully replayed all the WAL up to the point it last accessed. That way the changing value of the shared timestamp would have no effect on the calculated value at any point in time. I will recode using that concept. > In general, the purpose of a keep-alive is to keep the connection alive, > but you're trying to accomplish something else too, and I don't fully > understand what it is. That surprises me. If nothing else, its in the title of the thread, though since you personally added this to the Hot Standby todo more than 6 months ago I'd hope you of all people would understand the purpose. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 15 May 2010 14:50 On Sat, 2010-05-15 at 18:24 +0100, Simon Riggs wrote: > I will recode using that concept. There's some behaviours that aren't helpful here: Startup gets new pointer when it runs out of data to replay. That might or might not include an updated keepalive timestamp, since there's no exact relationship between chunks sent and chunks received. Startup might ask for a new chunk when half a chunk has been received, or when multiple chunks have been received. WALSender doesn't chunk up what it sends, though libpq does at a lower level. So there's no way to make a keepalive happen every X bytes without doing this from within libpq. WALSender sleeps even when it might have more WAL to send, it doesn't check it just unconditionally sleeps. At least WALReceiver loops until it has no more to receive. I just can't imagine why that's useful behaviour. All in all, I think we should be calling this "burst replication" not streaming replication. That does cause an issue in trying to monitor what's going on cos there's so much jitter. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 15 May 2010 16:08 Simon Riggs wrote: > WALSender sleeps even when it might have more WAL to send, it doesn't > check it just unconditionally sleeps. At least WALReceiver loops until > it has no more to receive. I just can't imagine why that's useful > behaviour. Good catch. That should be fixed. I also note that walsender doesn't respond to signals, while it's sending a large batch. That's analogous to the issue that was addressed recently in the archiver process. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 15 May 2010 17:05 Heikki Linnakangas wrote: > Simon Riggs wrote: >> WALSender sleeps even when it might have more WAL to send, it doesn't >> check it just unconditionally sleeps. At least WALReceiver loops until >> it has no more to receive. I just can't imagine why that's useful >> behaviour. > > Good catch. That should be fixed. > > I also note that walsender doesn't respond to signals, while it's > sending a large batch. That's analogous to the issue that was addressed > recently in the archiver process. Attached patch rearranges the walsender loops slightly to fix the above. XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE / 2) in one round and returns to the main loop after that even if there's unsent WAL, and the main loop no longer sleeps if there's unsent WAL. That way the main loop gets to respond to signals quickly, and we also get to update the shared memory status and PS display more often when there's a lot of catching up to do. Comments, have I screwed up anything? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
From: Fujii Masao on 16 May 2010 22:51
On Mon, May 17, 2010 at 1:11 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: > On Sat, 2010-05-15 at 19:50 +0100, Simon Riggs wrote: >> On Sat, 2010-05-15 at 18:24 +0100, Simon Riggs wrote: >> >> > I will recode using that concept. > >> Startup gets new pointer when it runs out of data to replay. That might >> or might not include an updated keepalive timestamp, since there's no >> exact relationship between chunks sent and chunks received. Startup >> might ask for a new chunk when half a chunk has been received, or when >> multiple chunks have been received. > > New version, with some other cleanup of wait processing. > > New logic is that when Startup asks for next applychunk of WAL it saves > the lastChunkTimestamp. That is then the base time used by > WaitExceedsMaxStandbyDelay(), except when latestXLogTime is later. > Since multiple receivechunks can arrive from primary before Startup asks > for next applychunk we use the oldest receivechunk timestamp, not the > latest. Doing it this way means the lastChunkTimestamp doesn't change > when new keepalives arrive, so we have a stable and reasonably accurate > recordSendTimestamp for each WAL record. > > The keepalive is sent as the first part of a new message, if any. So > partial chunks of data always have an accurate timestamp, even if that > is slightly older as a result. Doesn't make much difference except with > very large chunks. > > I think that addresses the points raised on this thread and others. Is it OK that this keepalive message cannot be used by HS in file-based log-shipping? Even in SR, the startup process cannot use the keepalive until walreceiver has been started up. WalSndKeepAlive() always calls initStringInfo(), which seems to cause memory-leak. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |