"caught_up" status in walsender [PgSql]

Prev: [HACKERS] "caught_up" status in walsender
Next: [HACKERS] Allow wal_keep_segments to keep all segments

From: Robert Haas on 2 Jun 2010 16:53

On Wed, Jun 2, 2010 at 2:44 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> I wrote:
>> I'm still inclined to apply the part of Simon's patch that adds a
>> transmit timestamp to each SR send chunk. That would actually be
>> completely unused by the slave given my proposal above, but I think that
>> it is an important step to take to future-proof the SR protocol against
>> possible changes in the slave-side timing logic.
>
> On further contemplation, it seems like the protocol needs another field
> besides that: each record should also carry a boolean indicating whether
> walsender.c thinks it is currently "caught up", ie the record carries
> all WAL data up to the current end of WAL. If the sender is not caught
> up, then the receiver should apply max_archive_delay not
> max_streaming_delay. In this way, WAL chunks that are a little bit
> behind current time will be treated the same way whether they come
> across the SR link or via the archive.

I'm not sure that makes sense. I thought the point of separating the
archive and streaming cases was that the same timeout wouldn't
necessarily be correct for a 16MB WAL file as it would for a 16-byte
WAL record you've just received. IOW, you might want
max_archive_delay > max_streaming_delay. The original proposal also
seems somewhat easier to understand, to me at least.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on 3 Jun 2010 05:18

On Thu, Jun 3, 2010 at 4:21 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes:
>> On 02/06/10 21:44, Tom Lane wrote:
>>> In the current coding, the effect of not setting *caughtup here is just
>>> that we uselessly call XLogSend an extra time for each transmission
>>> (because the main loop won't ever delay immediately after a
>>> transmission). But without this, we'd never send caughtup = true
>>> to the slave.
>
>> That's intentional. It could take some time for the WAL to be sent, if
>> the network is busy, so by the time XLogSend returns you might well not
>> be caught up anymore.
>
> It may have been intentional, but it's still wrong. If you were able to
> pull all of WAL into the record-to-be-sent, you should sleep afterwards,
> not send an extra record containing a few more bytes.

For reducing the workload of walsender?

This seems OK in 9.0 since only asynchronous replication is supported.
But when we'll implement synchronous replication in the future, we
might have to revert that change. Since a transaction commit might wait
for such an extra record to be replicated, walsender should aggressively
send all sendable WAL.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 3 Jun 2010 09:47

Fujii Masao <masao.fujii(a)gmail.com> writes:
> On Thu, Jun 3, 2010 at 4:21 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes:
>>> That's intentional. It could take some time for the WAL to be sent, if
>>> the network is busy, so by the time XLogSend returns you might well not
>>> be caught up anymore.
>>
>> It may have been intentional, but it's still wrong. �If you were able to
>> pull all of WAL into the record-to-be-sent, you should sleep afterwards,
>> not send an extra record containing a few more bytes.

> For reducing the workload of walsender?

> This seems OK in 9.0 since only asynchronous replication is supported.
> But when we'll implement synchronous replication in the future, we
> might have to revert that change. Since a transaction commit might wait
> for such an extra record to be replicated, walsender should aggressively
> send all sendable WAL.

It *is* aggressively sending all sendable WAL. The ideal steady state
behavior of this loop ought to be that once per sleep interval, we send
out one record containing all new WAL since the last time. We do not
want it sending 10000 bytes, then another record with 100 bytes, then
another record with 10 bytes, etc etc. That's inefficient and
ultimately pointless. You'll always be behind again a millisecond
later.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 3 Jun 2010 11:26

I wrote:
> On further contemplation, it seems like the protocol needs another field
> besides that: each record should also carry a boolean indicating whether
> walsender.c thinks it is currently "caught up", ie the record carries
> all WAL data up to the current end of WAL.

Actually, there's a better way to do that: let's have the record carry
not just a boolean but the actual current end-of-WAL LSN. The receiver
could then not just determine "am I behind" but find out *how far*
behind it is, and thereby perhaps adjust its behavior in more subtle
ways than just a binary on/off fashion.

(Actually doing anything like that is material for future work, of
course, but I think we should try to get the SR protocol right now.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev |
Pages: 1 2
Prev: [HACKERS] "caught_up" status in walsender
Next: [HACKERS] Allow wal_keep_segments to keep all segments