From: Greg Stark on
On Thu, Jun 3, 2010 at 12:11 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Greg Stark <gsstark(a)mit.edu> writes:
>> I was assuming the walreceiver only requests more wal in relatively
>> small chunks and only when replay has caught up and needs more data. I
>> haven't actually read this code so if that assumption is wrong then
>> I'm off-base.
>
> It is off-base.  The receiver does not "request" data, the sender is
> what determines how much WAL is sent when.

Hm, so what happens if the slave blocks, doesn't the sender block when
the kernel buffers fill up?

>> So I think this isn't necessarily such a blue moon event. As I
>> understand it, all it would take is a single long-running report and a
>> vacuum or HOT cleanup occurring on the master.
>
> I think this is mostly FUD too.  How often do you see vacuum blocked for
> an hour now?

No, that's not comparable. On the master vacuum will just ignore
tuples that are still visible to live transactions. On the slave it
doesn't have a choice, it sees the cleanup record and must pause
recovery until those transactions finish.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on
On Thu, Jun 3, 2010 at 5:00 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> I stand by my suggestion from yesterday: Let's define max_standby_delay
>> as the difference between a piece of WAL becoming available in the
>> standby, and applying it.
>
> My proposal is essentially the same as yours plus allowing the DBA to
> choose different max delays for the caught-up and not-caught-up cases.
> Maybe everybody will end up setting the two delays the same, but I think
> we don't have enough experience to decide that for them now.

Applying WAL that arrives via SR is not always the sign of the caught-up
or not-caught-up. OTOH, applying WAL restored from archive is not always
the sign of either of them. So isn't it nonsense to separate the delay in
order to control the behavior of a recovery for those cases?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Greg Stark <gsstark(a)mit.edu> writes:
>> Well, if the slave can't keep up, that's a separate problem. �It will
>> not fail to keep up as a result of the transmission mechanism.

> No, I mean if the slave is paused due to a conflict. Does it stop
> reading data from the master or does it buffer it up on disk? If it
> stops reading it from the master then the effect is the same as if the
> slave stopped "requesting" data even if there's no actual request.

The data keeps coming in and getting dumped into the slave's pg_xlog.
walsender/walreceiver are not at all tied to the slave's application
of WAL. In principle we could have the code around max_standby_delay
notice just how far behind it's gotten and adjust the delay tolerance
based on that; but I think designing a feedback loop for that is 9.1
material.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on
On Thu, Jun 3, 2010 at 4:34 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> The data keeps coming in and getting dumped into the slave's pg_xlog.
> walsender/walreceiver are not at all tied to the slave's application
> of WAL.  In principle we could have the code around max_standby_delay
> notice just how far behind it's gotten and adjust the delay tolerance
> based on that; but I think designing a feedback loop for that is 9.1
> material.

Er, no. In that case my first concern was misguided. I'm happy there's
no feedback loop -- my fear was that there was and it would mean the
"time received" could be decoupled from the time the wal was
generated. But as you describe it then the time received might be
slightly delayed from the time the wal was generated but to some
constant degree -- not in a way that will be influenced by the log
application being blocked on the slave.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Simon Riggs <simon(a)2ndQuadrant.com> writes:
> On Wed, 2010-06-02 at 13:14 -0400, Tom Lane wrote:
>> This patch seems to me to be going in fundamentally the wrong direction.
>> It's adding complexity and overhead (far more than is needed), and it's
>> failing utterly to resolve the objections that I raised to start with.

> Having read your proposal, it seems changing from time-on-sender to
> time-on-receiver is a one line change to the patch.

> What else are you thinking of removing, if anything?

Basically, we need to get rid of everything that feeds timestamps from
the WAL content into the kill-delay logic.

>> In particular, Simon seems to be basically refusing to do anything about
>> the complaint that the code fails unless master and standby clocks are
>> in close sync. I do not believe that this is acceptable, and since he
>> won't fix it, I guess I'll have to.

> Syncing two servers in replication is common practice, as has been
> explained here; I'm still surprised people think otherwise.

Doesn't affect the complaint in the least: I do not find it acceptable
to have that be *mandated* in order for our code to work sensibly.
I would be OK with having something approaching what you want as a
non-default optional behavior (with a clearly-documented dependency
on having synchronized clocks). But in any case the current behavior is
still quite broken as regards reading stale timestamps from WAL.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers