From: Heikki Linnakangas on
On 02/06/10 20:14, Tom Lane wrote:
> For realistic values of max_standby_delay ...

Hang on right there. What do you consider a realistic value for
max_standby_delay? Because I'm not sure I have a grip on that myself. 5
seconds? 5 minutes? 5 hours? I can see use cases for all of those...

> What I think might be a realistic compromise is this:
>
> 1. Separate max_standby_delay into two GUCs, say "max_streaming_delay"
> and "max_archive_delay".
>
> 2. When applying WAL that came across SR, use max_streaming_delay and
> let the time measurement be current time minus time of receipt of the
> current WAL send chunk.
>
> 3. When applying WAL that came from archive, use max_archive_delay and
> let the time measurement be current time minus time of acquisition of
> the current WAL segment from the archive.
>
> The current code's behavior in the latter case could effectively be
> modeled by setting max_archive_delay to zero, but that isn't the only
> plausible setting. More likely DBAs would set max_archive_delay to
> something smaller than max_streaming_delay, but still positive so as to
> not kill conflicting queries instantly.

The problem with defining max_archive_delay that way is again that you
can fall behind indefinitely. If you set it to 5 minutes, it means that
you'll wait a maximum of 5 minutes *per WAL segment*, even if WAL is
being generated faster.

I don't understand why you want to use a different delay when you're
restoring from archive vs. when you're streaming (what about existing
WAL files found in pg_xlog, BTW?). The source of WAL shouldn't make a
difference. If it's because you assume that restoring from archive is a
sign that you've fallen behind a lot, surely you've exceeded
max_standby_delay then and I still don't see a need for a separate GUC.

I stand by my suggestion from yesterday: Let's define max_standby_delay
as the difference between a piece of WAL becoming available in the
standby, and applying it. To approximate "piece of WAL becoming
available" for SR, we can use the mechanism with send/applyChunks from
Simon's latest patch, or go with the simpler scheme of just resetting a
"last caughtup timestamp" to current time whenever we have to wait for
new WAL to arrive. When restoring from archive, likewise reset "last
caughtup timestamp" whenever restore_command returns non-0, i.e we have
to wait for the next WAL file to arrive.

That works the same for both SR and file-based log shipping, there's
only one knob to set, is simple to implement and doesn't require
synchronized clocks.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes:
> The problem with defining max_archive_delay that way is again that you
> can fall behind indefinitely.

See my response to Greg Stark --- I don't think this is really an
issue. It's certainly far less of an issue than the current situation
that query grace periods go to zero under not-at-all-extreme conditions.

> I don't understand why you want to use a different delay when you're
> restoring from archive vs. when you're streaming (what about existing
> WAL files found in pg_xlog, BTW?).

You're missing the point. I want the DBA to be able to control what
happens in those two cases. In the current implementation he has no
control over what happens while restoring from archive: it's going to
effectively act like max_archive_delay = 0 all the time. If you're of
the opinion that's good, you can set the parameter that way and be
happy. I'm of the opinion you'll soon find out it isn't so good,
because it'll kill standby queries too easily.

> I stand by my suggestion from yesterday: Let's define max_standby_delay
> as the difference between a piece of WAL becoming available in the
> standby, and applying it.

My proposal is essentially the same as yours plus allowing the DBA to
choose different max delays for the caught-up and not-caught-up cases.
Maybe everybody will end up setting the two delays the same, but I think
we don't have enough experience to decide that for them now.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on
On Wed, Jun 2, 2010 at 8:14 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Indeed, but nothing we do can prevent that, if the slave is just plain
> slower than the master.  You have to assume that the slave is capable of
> keeping up in the absence of query-caused delays, or you're hosed.

I was assuming the walreceiver only requests more wal in relatively
small chunks and only when replay has caught up and needs more data. I
haven't actually read this code so if that assumption is wrong then
I'm off-base. But if my assumption is right then it's not merely the
master running faster than the slave that can cause you to fall
arbitrarily far behind. The "receive time" is delayed by how long the
slave waited on a previous lock, so if we wait for a few minutes, then
proceed and find we need more wal data we'll read data from the master
that it could have generated those few minutes ago.


> The sticky point is that once in a blue moon you do have a conflicting
> query sitting on a buffer lock for a long time, or even more likely a
> series of queries keeping the WAL replay process from obtaining buffer
> cleanup lock.

So I think this isn't necessarily such a blue moon event. As I
understand it, all it would take is a single long-running report and a
vacuum or HOT cleanup occurring on the master. If I want to set
max_standby_delay to 60min to allow reports to run for up to an hour
then any random HOT cleanup on the master will propagate to the slave
and cause a WAL stall until the transactions which have that xid in
their snapshot end.

Even the buffer locks are could potentially be blocked for a long time
if you happen to run the right kind of query (say a nested loop with
the buffer in question on the outside and a large table on the
inside). That's a rarer event though; is that what you were thinking
of?


--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Greg Stark <gsstark(a)mit.edu> writes:
> I was assuming the walreceiver only requests more wal in relatively
> small chunks and only when replay has caught up and needs more data. I
> haven't actually read this code so if that assumption is wrong then
> I'm off-base.

It is off-base. The receiver does not "request" data, the sender is
what determines how much WAL is sent when.

> So I think this isn't necessarily such a blue moon event. As I
> understand it, all it would take is a single long-running report and a
> vacuum or HOT cleanup occurring on the master.

I think this is mostly FUD too. How often do you see vacuum blocked for
an hour now? It probably can happen, which is why we need to be able to
kick queries off the locks with max_standby_delay, but it's far from
common. What we're concerned about here is how long a buffer lock on a
single page is held, not how long heavyweight locks are held. The
normal hold times are measured in microseconds.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Stephen Frost on
* Tom Lane (tgl(a)sss.pgh.pa.us) wrote:
> Greg Stark <gsstark(a)mit.edu> writes:
> > So I think this isn't necessarily such a blue moon event. As I
> > understand it, all it would take is a single long-running report and a
> > vacuum or HOT cleanup occurring on the master.
>
> I think this is mostly FUD too. How often do you see vacuum blocked for
> an hour now? It probably can happen, which is why we need to be able to
> kick queries off the locks with max_standby_delay, but it's far from
> common. What we're concerned about here is how long a buffer lock on a
> single page is held, not how long heavyweight locks are held. The
> normal hold times are measured in microseconds.

Maybe I'm missing something, but I think Greg's point was that if you
have a long-running query running against the standby/slave/whatever,
which is holding locks on various relations to implement that report,
and then a vacuum or HOT update happens on the master, the long-running
report query will get killed off unless you bump max_streaming_delay up
pretty high (eg: 60 mins).

That being said, I'm not sure that there's really another solution.
Yes, in this case, the slave can end up being an hour behind, but that's
the trade-off you have to make if you want to run an hour-long query on
the slave. The other answer is to make the master not update those
tuples, etc, which might be possible by starting a transaction on the
master which grabs things enough to prevent the vacuum/hot/etc update
from happening. That may be possible manually, but it's not fun and it
certainly isn't something we'll have built-in support for in 9.0.

Thanks,

Stephen