From: Heikki Linnakangas on
On 02/06/10 21:44, Tom Lane wrote:
> In conjunction with that, I think there's a logic bug in XLogSend;
> it ought to be changed like so:
>
> /* if we went beyond SendRqstPtr, back off */
> if (XLByteLT(SendRqstPtr, endptr))
> + {
> endptr = SendRqstPtr;
> + *caughtup = true;
> + }
>
> In the current coding, the effect of not setting *caughtup here is just
> that we uselessly call XLogSend an extra time for each transmission
> (because the main loop won't ever delay immediately after a
> transmission). But without this, we'd never send caughtup = true
> to the slave.

That's intentional. It could take some time for the WAL to be sent, if
the network is busy, so by the time XLogSend returns you might well not
be caught up anymore.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes:
> On 02/06/10 21:44, Tom Lane wrote:
>> In the current coding, the effect of not setting *caughtup here is just
>> that we uselessly call XLogSend an extra time for each transmission
>> (because the main loop won't ever delay immediately after a
>> transmission). But without this, we'd never send caughtup = true
>> to the slave.

> That's intentional. It could take some time for the WAL to be sent, if
> the network is busy, so by the time XLogSend returns you might well not
> be caught up anymore.

It may have been intentional, but it's still wrong. If you were able to
pull all of WAL into the record-to-be-sent, you should sleep afterwards,
not send an extra record containing a few more bytes.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on
Tom Lane <tgl(a)sss.pgh.pa.us> writes:

> I wrote:
>> I'm still inclined to apply the part of Simon's patch that adds a
>> transmit timestamp to each SR send chunk. That would actually be
>> completely unused by the slave given my proposal above, but I think that
>> it is an important step to take to future-proof the SR protocol against
>> possible changes in the slave-side timing logic.
>
> On further contemplation, it seems like the protocol needs another field
> besides that: each record should also carry a boolean indicating whether
> walsender.c thinks it is currently "caught up", ie the record carries
> all WAL data up to the current end of WAL. If the sender is not caught
> up, then the receiver should apply max_archive_delay not
> max_streaming_delay. In this way, WAL chunks that are a little bit
> behind current time will be treated the same way whether they come
> across the SR link or via the archive.

Then we'd have max_catchup_delay and max_streaming_delay, wouldn't we?

I'm still trying to understand the implications of the proposal, which
sounds good but… given just enough load on the slave, wouldn't it be
playing catchup all the time? Wouldn't enough load mean one read query
per write commit on the master?

Regards,
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Dimitri Fontaine <dfontaine(a)hi-media.com> writes:
> I'm still trying to understand the implications of the proposal, which
> sounds good but… given just enough load on the slave, wouldn't it be
> playing catchup all the time?

Uh, if the slave is overloaded, *any* implementation will be playing
catchup all the time. Not sure about your point here.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on
Tom Lane <tgl(a)sss.pgh.pa.us> writes:
> Uh, if the slave is overloaded, *any* implementation will be playing
> catchup all the time. Not sure about your point here.

Well, sorry, I missed the part where the catchup is measured on
walsender side of things. Now that means that a non interrupted flow of
queries quicker than the wal sender delay (100ms, right?) on the slave
would certainly leave enough room for it to stay current.

Well that depends also on the time it takes to replay the wals.

I'm trying to decide if exposing this 100ms magic setup (or something
else) would allow for a lot more control with respect to what means
being overloaded.

Say you set the 100ms delay to any value that you know (from testing and
log_min_duration_statement, say) just a little higher than the slowest
query you typically run on the slave. If that reduces the chances to
ever playing cathing-up (as soon as there's no unexpected slow query
there), that would be great.

If you can manage to make sense of this, I'm interested into hearing how
far it is from reality.

Regards,
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers