From: Simon Riggs on
On Wed, 2010-06-02 at 13:45 -0400, Tom Lane wrote:
> Stephen Frost <sfrost(a)snowman.net> writes:
> > * Tom Lane (tgl(a)sss.pgh.pa.us) wrote:
> >> Comments?
>
> > I'm not really a huge fan of adding another GUC, to be honest. I'm more
> > inclined to say we treat 'max_archive_delay' as '0', and turn
> > max_streaming_delay into what you've described. If we fall back so far
> > that we have to go back to reading WALs, then we need to hurry up and
> > catch-up and damn the torpedos.
>
> If I thought that 0 were a generally acceptable value, I'd still be
> pushing the "simplify it to a boolean" agenda ;-). The problem is that
> that will sometimes kill standby queries even when they are quite short
> and doing nothing objectionable.

OK, now I understand. I was just thinking the same as Stephen, but now I
agree we need a second parameter.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Wed, Jun 2, 2010 at 2:03 PM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
> On Wed, 2010-06-02 at 13:45 -0400, Tom Lane wrote:
>> Stephen Frost <sfrost(a)snowman.net> writes:
>> > * Tom Lane (tgl(a)sss.pgh.pa.us) wrote:
>> >> Comments?
>>
>> > I'm not really a huge fan of adding another GUC, to be honest.  I'm more
>> > inclined to say we treat 'max_archive_delay' as '0', and turn
>> > max_streaming_delay into what you've described.  If we fall back so far
>> > that we have to go back to reading WALs, then we need to hurry up and
>> > catch-up and damn the torpedos.
>>
>> If I thought that 0 were a generally acceptable value, I'd still be
>> pushing the "simplify it to a boolean" agenda ;-).  The problem is that
>> that will sometimes kill standby queries even when they are quite short
>> and doing nothing objectionable.
>
> OK, now I understand. I was just thinking the same as Stephen, but now I
> agree we need a second parameter.

I too was thinking the same as Stephen, but now I also agree we need a
second parameter. I think the whole design Tom proposed seems good.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on
On Wed, Jun 2, 2010 at 6:14 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:

> I believe that the motivation for treating archived timestamps as live
> is, essentially, to force rapid catchup if a slave falls behind so far
> that it's reading from archive instead of SR.


Huh, I think this is the first mention of this that I've seen. I
always assumed the motivation was just that you wanted to control how
much data loss could occur on failover and how long recovery would
take. I think separating the two delays is an interesting idea but I
don't see how it counts as a simplification.

This also still allows a slave to become arbitrarily far behind the
master. The master might take a long time to fill up a log file, and
if the slave is blocked for a few minutes, then requests the next bit
of log file and blocks for a few more minutes, then requests the next
log file and blocks for a few more minutes, etc. As long as the slave
is reading more slowly than the master is filling those segments it
will fall further and further behind but never be more than a few
minutes behind receiving the logs.

I propose an alternate way out of the problem of syncing two clocks.
Instead of comparing timestamps compare time intervals. So as it reads
xlog records it only ever compares the master timestamps with previous
master timestamps to determine how much time has elapsed on the
master. It compares that time elapsed with the time elapsed on the
slave to determine if it's falling behind. I'm assuming it
periodically catches up and can throw out its accumulated time elapsed
and start fresh. If not then the accumulated error might be a problem,
but hopefully one we can find a solution to.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Wed, 2010-06-02 at 13:14 -0400, Tom Lane wrote:

> This patch seems to me to be going in fundamentally the wrong direction.
> It's adding complexity and overhead (far more than is needed), and it's
> failing utterly to resolve the objections that I raised to start with.

Having read your proposal, it seems changing from time-on-sender to
time-on-receiver is a one line change to the patch.

What else are you thinking of removing, if anything?

Adding an extra parameter adds more obviously and is something I now
agree with.

> In particular, Simon seems to be basically refusing to do anything about
> the complaint that the code fails unless master and standby clocks are
> in close sync. I do not believe that this is acceptable, and since he
> won't fix it, I guess I'll have to.

Syncing two servers in replication is common practice, as has been
explained here; I'm still surprised people think otherwise. Measuring
the time between two servers is the very purpose of the patch, so the
synchronisation is not a design flaw, it is its raison d'etre. There's
been a few spleens emptied on that topic, not all of them mine, and
certainly no consensus on that. So I'm not refusing to do anything
that's been agreed...

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Greg Stark <gsstark(a)mit.edu> writes:
> On Wed, Jun 2, 2010 at 6:14 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> I believe that the motivation for treating archived timestamps as live
>> is, essentially, to force rapid catchup if a slave falls behind so far
>> that it's reading from archive instead of SR.

> Huh, I think this is the first mention of this that I've seen. I
> always assumed the motivation was just that you wanted to control how
> much data loss could occur on failover and how long recovery would
> take. I think separating the two delays is an interesting idea but I
> don't see how it counts as a simplification.

Well, it isn't a simplification: it's bringing it up to the minimum
complication level where it'll actually work sanely. The current
implementation doesn't work sanely because it confuses stale timestamps
read from WAL with real live time.

> This also still allows a slave to become arbitrarily far behind the
> master.

Indeed, but nothing we do can prevent that, if the slave is just plain
slower than the master. You have to assume that the slave is capable of
keeping up in the absence of query-caused delays, or you're hosed.

The real reason this is at issue is the fact that the max_standby_delay
kill mechanism applies to certain buffer-level locking operations.
On the master we just wait, and it's not a problem, because in practice
the conflicting queries almost always release these locks pretty quick.
On the slave, though, instant kill as a result of a buffer-level lock
conflict would result in a very serious degradation in standby query
reliability (while also doing practically nothing for the speed of WAL
application, most of the time). This morning on the phone Bruce and I
were seriously discussing the idea of ripping the max_standby_delay
mechanism out of the buffer-level locking paths, and just let them work
like they do on the master, ie, wait forever. If we did that then
simplifying max_standby_delay to a boolean would be reasonable again
(because it really would only come into play for DDL on the master).
The sticky point is that once in a blue moon you do have a conflicting
query sitting on a buffer lock for a long time, or even more likely a
series of queries keeping the WAL replay process from obtaining buffer
cleanup lock.

So it seems that we have to have max_standby_delay-like logic for those
locks, and also that zero grace period before kill isn't a very practical
setting. However, there isn't a lot of point in obsessing over exactly
how long the grace period ought to be, as long as it's more than a few
milliseconds. It *isn't* going to have any real effect on whether the
slave can stay caught up. You could make a fairly decent case for just
measuring the grace period from when the replay process starts to wait,
as I think I proposed awhile back. The value of measuring delay from a
receipt time is that if you do happen to have a bunch of delays within
a short interval you'll get more willing to kill queries --- but I
really believe that that is a corner case and will have nothing to do
with ordinary performance.

> I propose an alternate way out of the problem of syncing two clocks.
> Instead of comparing timestamps compare time intervals. So as it reads
> xlog records it only ever compares the master timestamps with previous
> master timestamps to determine how much time has elapsed on the
> master. It compares that time elapsed with the time elapsed on the
> slave to determine if it's falling behind.

I think this would just add complexity and uncertainty, to deal with
something that won't be much of a problem in practice.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers