Prev: [HACKERS] Keepalive for max_standby_delay
Next: Unexpected page allocation behavior on insert-only tables
From: Heikki Linnakangas on 2 Jun 2010 15:41 On 02/06/10 20:14, Tom Lane wrote: > For realistic values of max_standby_delay ... Hang on right there. What do you consider a realistic value for max_standby_delay? Because I'm not sure I have a grip on that myself. 5 seconds? 5 minutes? 5 hours? I can see use cases for all of those... > What I think might be a realistic compromise is this: > > 1. Separate max_standby_delay into two GUCs, say "max_streaming_delay" > and "max_archive_delay". > > 2. When applying WAL that came across SR, use max_streaming_delay and > let the time measurement be current time minus time of receipt of the > current WAL send chunk. > > 3. When applying WAL that came from archive, use max_archive_delay and > let the time measurement be current time minus time of acquisition of > the current WAL segment from the archive. > > The current code's behavior in the latter case could effectively be > modeled by setting max_archive_delay to zero, but that isn't the only > plausible setting. More likely DBAs would set max_archive_delay to > something smaller than max_streaming_delay, but still positive so as to > not kill conflicting queries instantly. The problem with defining max_archive_delay that way is again that you can fall behind indefinitely. If you set it to 5 minutes, it means that you'll wait a maximum of 5 minutes *per WAL segment*, even if WAL is being generated faster. I don't understand why you want to use a different delay when you're restoring from archive vs. when you're streaming (what about existing WAL files found in pg_xlog, BTW?). The source of WAL shouldn't make a difference. If it's because you assume that restoring from archive is a sign that you've fallen behind a lot, surely you've exceeded max_standby_delay then and I still don't see a need for a separate GUC. I stand by my suggestion from yesterday: Let's define max_standby_delay as the difference between a piece of WAL becoming available in the standby, and applying it. To approximate "piece of WAL becoming available" for SR, we can use the mechanism with send/applyChunks from Simon's latest patch, or go with the simpler scheme of just resetting a "last caughtup timestamp" to current time whenever we have to wait for new WAL to arrive. When restoring from archive, likewise reset "last caughtup timestamp" whenever restore_command returns non-0, i.e we have to wait for the next WAL file to arrive. That works the same for both SR and file-based log shipping, there's only one knob to set, is simple to implement and doesn't require synchronized clocks. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 2 Jun 2010 16:00 Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes: > The problem with defining max_archive_delay that way is again that you > can fall behind indefinitely. See my response to Greg Stark --- I don't think this is really an issue. It's certainly far less of an issue than the current situation that query grace periods go to zero under not-at-all-extreme conditions. > I don't understand why you want to use a different delay when you're > restoring from archive vs. when you're streaming (what about existing > WAL files found in pg_xlog, BTW?). You're missing the point. I want the DBA to be able to control what happens in those two cases. In the current implementation he has no control over what happens while restoring from archive: it's going to effectively act like max_archive_delay = 0 all the time. If you're of the opinion that's good, you can set the parameter that way and be happy. I'm of the opinion you'll soon find out it isn't so good, because it'll kill standby queries too easily. > I stand by my suggestion from yesterday: Let's define max_standby_delay > as the difference between a piece of WAL becoming available in the > standby, and applying it. My proposal is essentially the same as yours plus allowing the DBA to choose different max delays for the caught-up and not-caught-up cases. Maybe everybody will end up setting the two delays the same, but I think we don't have enough experience to decide that for them now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Stark on 2 Jun 2010 18:59 On Wed, Jun 2, 2010 at 8:14 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > Indeed, but nothing we do can prevent that, if the slave is just plain > slower than the master. You have to assume that the slave is capable of > keeping up in the absence of query-caused delays, or you're hosed. I was assuming the walreceiver only requests more wal in relatively small chunks and only when replay has caught up and needs more data. I haven't actually read this code so if that assumption is wrong then I'm off-base. But if my assumption is right then it's not merely the master running faster than the slave that can cause you to fall arbitrarily far behind. The "receive time" is delayed by how long the slave waited on a previous lock, so if we wait for a few minutes, then proceed and find we need more wal data we'll read data from the master that it could have generated those few minutes ago. > The sticky point is that once in a blue moon you do have a conflicting > query sitting on a buffer lock for a long time, or even more likely a > series of queries keeping the WAL replay process from obtaining buffer > cleanup lock. So I think this isn't necessarily such a blue moon event. As I understand it, all it would take is a single long-running report and a vacuum or HOT cleanup occurring on the master. If I want to set max_standby_delay to 60min to allow reports to run for up to an hour then any random HOT cleanup on the master will propagate to the slave and cause a WAL stall until the transactions which have that xid in their snapshot end. Even the buffer locks are could potentially be blocked for a long time if you happen to run the right kind of query (say a nested loop with the buffer in question on the outside and a large table on the inside). That's a rarer event though; is that what you were thinking of? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 2 Jun 2010 19:11 Greg Stark <gsstark(a)mit.edu> writes: > I was assuming the walreceiver only requests more wal in relatively > small chunks and only when replay has caught up and needs more data. I > haven't actually read this code so if that assumption is wrong then > I'm off-base. It is off-base. The receiver does not "request" data, the sender is what determines how much WAL is sent when. > So I think this isn't necessarily such a blue moon event. As I > understand it, all it would take is a single long-running report and a > vacuum or HOT cleanup occurring on the master. I think this is mostly FUD too. How often do you see vacuum blocked for an hour now? It probably can happen, which is why we need to be able to kick queries off the locks with max_standby_delay, but it's far from common. What we're concerned about here is how long a buffer lock on a single page is held, not how long heavyweight locks are held. The normal hold times are measured in microseconds. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Stephen Frost on 2 Jun 2010 20:47
* Tom Lane (tgl(a)sss.pgh.pa.us) wrote: > Greg Stark <gsstark(a)mit.edu> writes: > > So I think this isn't necessarily such a blue moon event. As I > > understand it, all it would take is a single long-running report and a > > vacuum or HOT cleanup occurring on the master. > > I think this is mostly FUD too. How often do you see vacuum blocked for > an hour now? It probably can happen, which is why we need to be able to > kick queries off the locks with max_standby_delay, but it's far from > common. What we're concerned about here is how long a buffer lock on a > single page is held, not how long heavyweight locks are held. The > normal hold times are measured in microseconds. Maybe I'm missing something, but I think Greg's point was that if you have a long-running query running against the standby/slave/whatever, which is holding locks on various relations to implement that report, and then a vacuum or HOT update happens on the master, the long-running report query will get killed off unless you bump max_streaming_delay up pretty high (eg: 60 mins). That being said, I'm not sure that there's really another solution. Yes, in this case, the slave can end up being an hour behind, but that's the trade-off you have to make if you want to run an hour-long query on the slave. The other answer is to make the master not update those tuples, etc, which might be possible by starting a transaction on the master which grabs things enough to prevent the vacuum/hot/etc update from happening. That may be possible manually, but it's not fun and it certainly isn't something we'll have built-in support for in 9.0. Thanks, Stephen |