From: "Kevin Grittner" on
Simon Riggs <simon(a)2ndQuadrant.com> wrote:
> On Mon, 2010-05-31 at 14:40 -0400, Bruce Momjian wrote:
>
>> Uh, we have three days before we package 9.0beta2. It would be
>> good if we could decide on the max_standby_delay issue soon.
>
> I've heard something from Heikki, not from anyone else. Those
> comments amount to "lets replace max_standby_delay with
> max_apply_delay".
>
> Got a beta idea?

Given the incessant ticking of the clock, I have a hard time
believing we have any real options besides max_standby_delay or a
boolean which corresponds to the -1 and 0 settings of
max_standby_delay. I think it's pretty clear that there's a use
case for the positive values, although there are bound to be some
who try it and are surprised by behavior at transition from idle to
active. The whole debate seems to boil down to how important a
middle ground is versus how damaging the surprise factor is. (I
don't really buy the argument that we won't be able to remove it
later if we replace it with something better.)

I know there were initially some technical problems, too; have
those been resolved?

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Simon Riggs <simon(a)2ndQuadrant.com> writes:
> OK, here's v4.

I've been trying to stay out of this thread, but with beta2 approaching
and no resolution in sight, I'm afraid I have to get involved.

This patch seems to me to be going in fundamentally the wrong direction.
It's adding complexity and overhead (far more than is needed), and it's
failing utterly to resolve the objections that I raised to start with.

In particular, Simon seems to be basically refusing to do anything about
the complaint that the code fails unless master and standby clocks are
in close sync. I do not believe that this is acceptable, and since he
won't fix it, I guess I'll have to.

The other basic problem that I had with the current code, which this
does nothing about, is that it believes that timestamps pulled from WAL
archive should be treated on the same footing as timestamps of "live"
actions. That might work in certain limited scenarios but it's going to
be a disaster in general.

I believe that the motivation for treating archived timestamps as live
is, essentially, to force rapid catchup if a slave falls behind so far
that it's reading from archive instead of SR. There are certainly
use-cases where that's appropriate (though, again, not all); but even
when you do want it it's a pretty inflexible implementation. For
realistic values of max_standby_delay the behavior is going to pretty
much be "instant kill whenever we're reading from archive".

I have backed off my original suggestion that we should reduce
max_standby_delay to a boolean: that was based on the idea that delays
would only occur in response to DDL on the master, but since vacuum and
btree page splits can also trigger delays, it seems clear that a "kill
queries immediately" policy isn't really very useful in practice. So we
need to make max_standby_delay work rather than just get rid of it.

What I think might be a realistic compromise is this:

1. Separate max_standby_delay into two GUCs, say "max_streaming_delay"
and "max_archive_delay".

2. When applying WAL that came across SR, use max_streaming_delay and
let the time measurement be current time minus time of receipt of the
current WAL send chunk.

3. When applying WAL that came from archive, use max_archive_delay and
let the time measurement be current time minus time of acquisition of
the current WAL segment from the archive.

The current code's behavior in the latter case could effectively be
modeled by setting max_archive_delay to zero, but that isn't the only
plausible setting. More likely DBAs would set max_archive_delay to
something smaller than max_streaming_delay, but still positive so as to
not kill conflicting queries instantly.

An important property of this design is that all relevant timestamps
are taken on the slave, so clock skew isn't an issue.

I'm still inclined to apply the part of Simon's patch that adds a
transmit timestamp to each SR send chunk. That would actually be
completely unused by the slave given my proposal above, but I think that
it is an important step to take to future-proof the SR protocol against
possible changes in the slave-side timing logic. I don't however see
the value of transmitting "keepalive" records when we'd otherwise not
have anything to send. The value of adding timestamps to the SR
protocol is really to let the slave determine the current amount of
clock skew between it and the master; which is a number that should not
change so rapidly that it has to be updated every 100ms even in an idle
system.

Comments?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Stephen Frost on
* Tom Lane (tgl(a)sss.pgh.pa.us) wrote:
> An important property of this design is that all relevant timestamps
> are taken on the slave, so clock skew isn't an issue.

I agree that this is important, and I do run NTP on all my servers and
even monitor it using Nagios.

It's still not a cure-all for time skew issues.

> Comments?

I'm not really a huge fan of adding another GUC, to be honest. I'm more
inclined to say we treat 'max_archive_delay' as '0', and turn
max_streaming_delay into what you've described. If we fall back so far
that we have to go back to reading WALs, then we need to hurry up and
catch-up and damn the torpedos. I'd also prefer that we only wait the
delay time once until we're fully caught up again (and have gotten
back around to waiting for new data).

Thanks,

Stephen
From: Andrew Dunstan on


Tom Lane wrote:
> I'm still inclined to apply the part of Simon's patch that adds a
> transmit timestamp to each SR send chunk. That would actually be
> completely unused by the slave given my proposal above, but I think that
> it is an important step to take to future-proof the SR protocol against
> possible changes in the slave-side timing logic.
>

+1.

From a radically different perspective, I had to do something similar
in the buildfarm years ago to protect us from machines reporting with
grossly inaccurate timestamps. This was part of the solution. The client
adds its current timestamp setting just before transmitting the data to
the server.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Stephen Frost <sfrost(a)snowman.net> writes:
> * Tom Lane (tgl(a)sss.pgh.pa.us) wrote:
>> Comments?

> I'm not really a huge fan of adding another GUC, to be honest. I'm more
> inclined to say we treat 'max_archive_delay' as '0', and turn
> max_streaming_delay into what you've described. If we fall back so far
> that we have to go back to reading WALs, then we need to hurry up and
> catch-up and damn the torpedos.

If I thought that 0 were a generally acceptable value, I'd still be
pushing the "simplify it to a boolean" agenda ;-). The problem is that
that will sometimes kill standby queries even when they are quite short
and doing nothing objectionable.

> I'd also prefer that we only wait the
> delay time once until we're fully caught up again (and have gotten
> back around to waiting for new data).

The delays will be measured from a receipt instant to current time,
which means that the longer it takes to apply a WAL segment or WAL
send chunk, the less grace period there will be. (Which is the
same as what CVS HEAD does --- I'm just arguing about where we get
the start time from.) I believe this does what you suggest and more.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers