Prev: Further Hot Standby documentation required
Next: [HACKERS] Streaming replication - unable to stop the standby
From: Josh Berkus on 3 May 2010 15:37 Simon, > My initial view was that the High Availability goal/role should be the > default or most likely mode of operation. I would say that the current > max_standby_delay favours the HA route since it specifically limits the > amount by which server can fall behind. I don't understand how Tom's approach would cause the slave to be further behind than the current max_standy_delay code, and I can see ways in which it would result in less delay. So, explain? The main issue with Tom's list which struck me was that max_standby_delay was linked to the system clock. HS is going to get used by a lot of PG users who aren't running time sync on their servers, or who let it get out of whack without fixing it. I'd thought that the delay was somehow based on transaction timestamps coming from the master. Keep in mind that there will be a *lot* of people using this feature, including ones without compentent & available sysadmins. The lock method appeals to me simply because it would eliminate the "mass cancel" issues which Greg Smith was reporting every time the timer runs down. That is, it seems to me that only the oldest queries would be cancelled and not any new ones. The biggest drawback I can see to Tom's approach is possible blocking on the slave due to the lock wait from the recovery process. However, this could be managed with the new lock-waits GUC, as well as statement timeout. Overall, I think Tom's proposal gives me what I would prefer, which is degraded performance on the slave but in ways which users are used to, rather than a lot of query cancel, which will interfere with user application porting. Would the recovery lock show up in pg_locks? That would also be a good diagnostic tool. I am happy to test some of this on Amazon or GoGrid, which is what I was planning on doing anyway. P.S. can we avoid the "considered harmful" phrase? It carries a lot of baggage ... -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 3 May 2010 15:39 Robert Haas <robertmhaas(a)gmail.com> writes: > On Mon, May 3, 2010 at 11:37 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: >> I'm inclined to think that we should throw away all this logic and just >> have the slave cancel competing queries if the replay process waits >> more than max_standby_delay seconds to acquire a lock. > What if we somehow get into a situation where the replay process is > waiting for a lock over and over and over again, because it keeps > killing conflicting processes but something restarts them and they > take locks over again? They won't be able to take locks "over again", because the lock manager won't allow requests to pass a pending previous request, except in very limited circumstances that shouldn't hold here. They'll queue up behind the replay process's lock request, not in front of it. (If that isn't the case, it needs to be fixed, quite independently of this concern.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Greg Sabino Mullane" on 3 May 2010 15:41 -----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > Based on that, I don't know that there's really much user-seen behaviour > between the two, except in 'oddball' situations, where there's a time > skew between the servers, or a large lag, etc, in which case I think Certainly that one particular case can be solved by making the servers be in time sync a prereq for HS working (in the traditional way). And by "prereq" I mean a "user beware" documentation warning. - -- Greg Sabino Mullane greg(a)turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201005031539 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkvfJr0ACgkQvJuQZxSWSsgSRwCgwAZpKJDqHX28y90rCx/CNXDt JGgAoO9JeoBacvTJ09UJ+o1Nek3KtcYR =gvch -----END PGP SIGNATURE----- -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 3 May 2010 15:54 On Mon, 2010-05-03 at 15:32 -0400, Stephen Frost wrote: > Simon, > > * Simon Riggs (simon(a)2ndQuadrant.com) wrote: > > Tom's proposed behaviour (has also been proposed before) favours the > > avoid query cancellation route though could lead to huge amounts of lag. > > My impression of Tom's suggestion was that it would also be a maximum > amount of delay which would be allowed before killing off queries- not > that it would be able to wait indefinitely until no one is blocking. > Based on that, I don't know that there's really much user-seen behaviour > between the two, except in 'oddball' situations, where there's a time > skew between the servers, or a large lag, etc, in which case I think > Tom's proposal would be more likely what's 'expected', whereas what you > would get with the existing implementation (zero time delay, or far too > much) would be a 'gotcha'.. If recovery waits for max_standby_delay every time something gets in its way, it should be clear that if many things get in its way it will progressively fall behind. There is no limit to this and it can always fall further behind. It does result in fewer cancelled queries and I do understand many may like that. That is *significantly* different from how it works now. (Plus: If there really was no difference, why not leave it as is?) The bottom line is this is about conflict resolution. There is simply no way to resolve conflicts without favouring one or other of the protagonists. Whatever mechanism you come up with that favours one will, disfavour the other. I'm happy to give choices, but I'm not happy to force just one kind of conflict resolution. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 3 May 2010 16:08
On Mon, 2010-05-03 at 15:39 -0400, Tom Lane wrote: > Robert Haas <robertmhaas(a)gmail.com> writes: > > On Mon, May 3, 2010 at 11:37 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > >> I'm inclined to think that we should throw away all this logic and just > >> have the slave cancel competing queries if the replay process waits > >> more than max_standby_delay seconds to acquire a lock. > > > What if we somehow get into a situation where the replay process is > > waiting for a lock over and over and over again, because it keeps > > killing conflicting processes but something restarts them and they > > take locks over again? > > They won't be able to take locks "over again", because the lock manager > won't allow requests to pass a pending previous request, except in > very limited circumstances that shouldn't hold here. They'll queue > up behind the replay process's lock request, not in front of it. > (If that isn't the case, it needs to be fixed, quite independently > of this concern.) Most conflicts aren't lock-manager locks, they are snapshot conflicts, though clearly different workloads will have different characteristics. Some conflicts are buffer conflicts and the semantics of buffer cleanup locks and many other internal locks are that shared locks queue jump past exclusive lock requests. Not something we should touch, now at least. I understand that you aren't impressed by everything about the current patch but rushed changes may not help either. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |