Prev: Why isn't stats_temp_directory automatically created?
Next: Hot Standby query cancellation and Streaming Replicationintegration
From: Greg Stark on 26 Feb 2010 07:10 On Fri, Feb 26, 2010 at 8:33 AM, Greg Smith <greg(a)2ndquadrant.com> wrote: > > I'm not sure what you might be expecting from the above combination, but > what actually happens is that many of the SELECT statements on the table > *that isn't even being updated* are canceled. You see this in the logs: Well I proposed that the default should be to wait forever when applying WAL logs that conflict with a query. Precisely because I think the expectation is that things will "just work" and queries not fail unpredictably. Perhaps in your test a larger max_standby_delay would have prevented the cancellations but then as soon as you try a query which lasts longer you would have to raise it again. There's no safe value which will be right for everyone. > If you're running a system that also is using Streaming Replication, there > is a much better approach possible. So I think one of the main advantages of a log shipping system over the trigger-based systems is precisely that it doesn't require the master to do anything it wasn't doing already. There's nothing the slave can do which can interfere with the master's normal operation. This independence is really a huge feature. It means you can allow users on the slave that you would never let near the master. The master can continue running production query traffic while users run all kinds of crazy queries on the slave and drive it into the ground and the master will continue on blithely unaware that anything's changed. In the model you describe any long-lived queries on the slave cause tables in the master to bloat with dead records. I think this model is on the roadmap but it's not appropriate for everyone and I think one of the benefits of having delayed it is that it forces us to get the independent model right before throwing in extra complications. It would be too easy to rely on the slave feedback as an answer for hard questions about usability if we had it and just ignore the question of what to do when it's not the right solution for the user. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 26 Feb 2010 12:45 On Fri, Feb 26, 2010 at 10:21 AM, Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> wrote: > Richard Huxton wrote: >> Can we not wait to cancel the transaction until *any* new lock is >> attempted though? That should protect all the single-statement >> long-running transactions that are already underway. Aggregates etc. > > Hmm, that's an interesting thought. You'll still need to somehow tell > the victim backend "you have to fail if you try to acquire any more > locks", but a single per-backend flag in the procarray would suffice. > > You could also clear the flag whenever you free the last snapshot in the > transaction (ie. between each query in read committed mode). Wow, that seems like it would help a lot. Although I'm not 100% sure I follow all the details of how this works. ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Stark on 26 Feb 2010 12:46 On Fri, Feb 26, 2010 at 4:43 PM, Richard Huxton <dev(a)archonet.com> wrote: > Let's see if I've got the concepts clear here, and hopefully my thinking it > through will help others reading the archives. > > There are two queues: I don't see two queues. I only see the one queue of operations which have been executed on the master but not replayed yet on the slave. Every write operation on the master enqueues an operation to it and every operation replayed on the slave dequeues from it. Only a subset of operations create conflicts with concurrent transactions on the slave, namely vacuums and a few similar operations (HOT pruning and btree index pruning). There's no question we need to make sure users have good tools to monitor this queue and are aware of these tools. You can query each slave for its currently replayed log position and hopefully you can find out how long it's been delayed (ie, if it's looking at a log record and waiting for a conflict to clear how long ago that log record was generated). You can also find out what the log position is on the master. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 26 Feb 2010 13:53 Greg Stark <gsstark(a)mit.edu> writes: > In the model you describe any long-lived queries on the slave cause > tables in the master to bloat with dead records. Yup, same as they would do on the master. > I think this model is on the roadmap but it's not appropriate for > everyone and I think one of the benefits of having delayed it is that > it forces us to get the independent model right before throwing in > extra complications. It would be too easy to rely on the slave > feedback as an answer for hard questions about usability if we had it > and just ignore the question of what to do when it's not the right > solution for the user. I'm going to make an unvarnished assertion here. I believe that the notion of synchronizing the WAL stream against slave queries is fundamentally wrong and we will never be able to make it work. The information needed isn't available in the log stream and can't be made available without very large additions (and consequent performance penalties). As we start getting actual beta testing we are going to uncover all sorts of missed cases that are not going to be fixable without piling additional ugly kluges on top of the ones Simon has already crammed into the system. Performance and reliability will both suffer. I think that what we are going to have to do before we can ship 9.0 is rip all of that stuff out and replace it with the sort of closed-loop synchronization Greg Smith is pushing. It will probably be several months before everyone is forced to accept that, which is why 9.0 is not going to ship this year. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 26 Feb 2010 14:16
Josh Berkus <josh(a)agliodbs.com> writes: > On 2/26/10 10:53 AM, Tom Lane wrote: >> I think that what we are going to have to do before we can ship 9.0 >> is rip all of that stuff out and replace it with the sort of closed-loop >> synchronization Greg Smith is pushing. It will probably be several >> months before everyone is forced to accept that, which is why 9.0 is >> not going to ship this year. > I don't think that publishing visibility info back to the master ... and > subsequently burdening the master substantially for each additional > slave ... are what most users want. I don't see a "substantial additional burden" there. What I would imagine is needed is that the slave transmits a single number back --- its current oldest xmin --- and the walsender process publishes that number as its transaction xmin in its PGPROC entry on the master. I don't doubt that this approach will have its own gotchas that we find as we get into it. But it looks soluble. I have no faith in either the correctness or the usability of the approach currently being pursued. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |