From: Tom Lane on
Simon Riggs <simon(a)2ndQuadrant.com> writes:
> Patch adds a keepalive message to ensure max_standby_delay is useful.

The proposed placement of the keepalive-send is about the worst it could
possibly be. It needs to be done right before pq_flush to ensure
minimum transfer delay. Otherwise any attempt to measure clock skew
using the timestamp will be seriously off. Where you've placed it
guarantees maximum delay not minimum.

I'm also extremely dubious that it's a good idea to set
recoveryLastXTime from this. Using both that and the timestamps from
the wal log flies in the face of everything I remember about control
theory. We should be doing only one or only the other, I think.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Sat, 2010-05-15 at 11:45 -0400, Tom Lane wrote:
> Simon Riggs <simon(a)2ndQuadrant.com> writes:
> > Patch adds a keepalive message to ensure max_standby_delay is useful.
>
> The proposed placement of the keepalive-send is about the worst it could
> possibly be. It needs to be done right before pq_flush to ensure
> minimum transfer delay. Otherwise any attempt to measure clock skew
> using the timestamp will be seriously off. Where you've placed it
> guarantees maximum delay not minimum.

I don't understand. WalSndKeepAlive() contains a pq_flush() immediately
after the timestamp is set. I did that way for exactly the same reasons
you've said.

Do you mean you only want to see one pq_flush()? I used two so that the
actual data is never delayed by a keepalive. WAL Sender was going to
sleep anyway, so shouldn't be a problem.

> I'm also extremely dubious that it's a good idea to set
> recoveryLastXTime from this. Using both that and the timestamps from
> the wal log flies in the face of everything I remember about control
> theory. We should be doing only one or only the other, I think.

I can change it so that the recoveryLastXTime will not be updated if we
are using the value from the keepalives. So we have one, or the other.
Remember that replication can switch backwards and forwards between
modes, so it seems sensible to have a common timing value whichever mode
we're in.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on
Simon Riggs wrote:
> On Sat, 2010-05-15 at 11:45 -0400, Tom Lane wrote:
>> I'm also extremely dubious that it's a good idea to set
>> recoveryLastXTime from this. Using both that and the timestamps from
>> the wal log flies in the face of everything I remember about control
>> theory. We should be doing only one or only the other, I think.
>
> I can change it so that the recoveryLastXTime will not be updated if we
> are using the value from the keepalives. So we have one, or the other.
> Remember that replication can switch backwards and forwards between
> modes, so it seems sensible to have a common timing value whichever mode
> we're in.

That means that recoveryLastXTime can jump forwards and backwards.
Doesn't feel right to me either. If you want to expose the
keepalive-time to queries, it should be a separate field, something like
lastMasterKeepaliveTime and a pg_last_master_keepalive() function to
read it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Sat, 2010-05-15 at 19:30 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Sat, 2010-05-15 at 11:45 -0400, Tom Lane wrote:
> >> I'm also extremely dubious that it's a good idea to set
> >> recoveryLastXTime from this. Using both that and the timestamps from
> >> the wal log flies in the face of everything I remember about control
> >> theory. We should be doing only one or only the other, I think.
> >
> > I can change it so that the recoveryLastXTime will not be updated if we
> > are using the value from the keepalives. So we have one, or the other.
> > Remember that replication can switch backwards and forwards between
> > modes, so it seems sensible to have a common timing value whichever mode
> > we're in.
>
> That means that recoveryLastXTime can jump forwards and backwards.

That behaviour would be bad, so why not just prevent that from
happening?

> Doesn't feel right to me either. If you want to expose the
> keepalive-time to queries, it should be a separate field, something like
> lastMasterKeepaliveTime and a pg_last_master_keepalive() function to
> read it.

That wouldn't be good because then you couldn't easily monitor the
delay? You'd have to run two different functions depending on the state
of replication (for which we would need yet another function). Users
would just wrap that back up into a single function.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on
Simon Riggs wrote:
> On Sat, 2010-05-15 at 19:30 +0300, Heikki Linnakangas wrote:
>> Doesn't feel right to me either. If you want to expose the
>> keepalive-time to queries, it should be a separate field, something like
>> lastMasterKeepaliveTime and a pg_last_master_keepalive() function to
>> read it.
>
> That wouldn't be good because then you couldn't easily monitor the
> delay? You'd have to run two different functions depending on the state
> of replication (for which we would need yet another function). Users
> would just wrap that back up into a single function.

What exactly is the user trying to monitor? If it's "how far behind is
the standby", the difference between pg_current_xlog_insert_location()
in the master and pg_last_xlog_replay_location() in the standby seems
more robust and well-defined to me. It's a measure of XLOG location (ie.
bytes) instead of time, but time is a complicated concept.

Also note that as the patch stands, if you receive a keep-alive from the
master at point X, it doesn't mean that the standby is fully up-to-date.
It's possible that the walsender just finished sending a huge batch of
accumulated WAL, say 1 GB, and it took 1 hour for the batch to be sent.
During that time, a lot more WAL has accumulated, yet walsender sends a
keep-alive with the current timestamp.

In general, the purpose of a keep-alive is to keep the connection alive,
but you're trying to accomplish something else too, and I don't fully
understand what it is.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers