Prev: Hot Standby query cancellation and Streaming Replication integration
Next: ProcSignalSlot vs. PGPROC
From: Heikki Linnakangas on 27 Feb 2010 01:59 Josh Berkus wrote: >> That is exactly the core idea I was trying to suggest in my rambling >> message. Just that small additional bit of information transmitted and >> published to the master via that route, and it's possible to optimize >> this problem in a way not available now. And it's a way that I believe >> will feel more natural to some users who may not be well served by any >> of the existing tuning possibilities. > > Well, if both you and Tom think it would be relatively easy (or at least > easier that continuing to pursue query cancel troubleshooting), then > please start coding it. It was always a possible approach, we just > collectively thought that query cancel would be easier. You still need query cancels. A feedback loop just makes it happen less frequently. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 27 Feb 2010 04:07 Dimitri Fontaine wrote: > Bruce Momjian <bruce(a)momjian.us> writes: >> Doesn't the system already adjust the delay based on the length of slave >> transactions, e.g. max_standby_delay. It seems there is no need for a >> user switch --- just max_standby_delay really high. > > Well that GUC looks like it allows to set a compromise between HA and > reporting, not to say "do not ever give the priority to the replay while > I'm running my reports". At least that's how I understand it. max_standby_delay=-1 does that. The documentation needs to be updated to reflect that, it currently says: > There is no wait-forever setting because of the potential for deadlock which that setting would introduce. This parameter can only be set in the postgresql.conf file or on the server command line. but that is false, -1 means wait forever. Simon removed that option at one point, but it was later put back and apparently the documentation was never updated. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 27 Feb 2010 04:33 Heikki Linnakangas wrote: > Dimitri Fontaine wrote: >> Bruce Momjian <bruce(a)momjian.us> writes: >>> Doesn't the system already adjust the delay based on the length of slave >>> transactions, e.g. max_standby_delay. It seems there is no need for a >>> user switch --- just max_standby_delay really high. >> Well that GUC looks like it allows to set a compromise between HA and >> reporting, not to say "do not ever give the priority to the replay while >> I'm running my reports". At least that's how I understand it. > > max_standby_delay=-1 does that. The documentation needs to be updated to > reflect that, it currently says: > >> There is no wait-forever setting because of the potential for deadlock which that setting would introduce. This parameter can only be set in the postgresql.conf file or on the server command line. > > but that is false, -1 means wait forever. Simon removed that option at > one point, but it was later put back and apparently the documentation > was never updated. I've put back the mention of max_standby_delay=-1 option in the docs. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 28 Feb 2010 00:28 Bruce Momjian wrote: >> "The first option is to connect to the primary server and keep a query >> active for as long as needed to run queries on the standby. This >> guarantees that a WAL cleanup record is never generated and query >> conflicts do not occur, as described above. This could be done using >> contrib/dblink and pg_sleep(), or via other mechanisms." >> > > I am unclear how you would easily advance the snapshot as each query > completes on the slave. > The idea of the workaround is that if you have a single long-running query to execute, and you want to make sure it doesn't get canceled because of a vacuum cleanup, you just have it connect back to the master to keep an open snapshot the whole time. That's basically the same idea that vacuum_defer_cleanup_age implements, except you don't have to calculate a value--you just hold open the snapshot to do it. When that query ended, its snapshot would be removed, and then the master would advance to whatever the next latest one is. Nothing fancier than that. The only similarity is that if you made every query that happened on the standby do that, it would effectively be the same behavior I'm suggesting could be available via the standby->master xmin publication. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Josh Berkus on 28 Feb 2010 14:47
All, First, from the nature of the arguments, we need to eventually have both versions of SR: delay-based and xmin-pub. And it would be fantastic if Greg Smith and Tom Lane could work on xmin-pub to see if we can get it ready as well. I also think, based on the discussion and Greg's test case, that we could do two things which would make the shortcomings of delay-based SR a vastly better experience for users: 1) Automated retry of cancelled queries on the slave. I have no idea how hard this would be to implement, but it makes the difference between writing lots of exception-handling code for slave connections (unacceptable) to just slow response times on the slave (acceptable). 2) A more usable vacuum_defer_cleanup_age. If it was feasible for a user to configure the master to not vacuum records less than, say, 5 minutes dead, then that would again offer the choice to the user of slightly degraded performance on the master (acceptable) vs. lots of query cancel (unacceptable). I'm going to test Greg's case with vacuum_cleanup_age used fairly liberally to see if this approach has merit. Why do I say that "lots of query cancel" is "unacceptable"? For the simple reason that one cannot run the same application code against an HS+SR cluster with lots of query cancel as one runs against a standalone database. And if that's true, then the main advantage of HS+SR over Slony and Londiste is gone. MySQL took great pains to make sure that you could run the same code against replicated MySQL as standalone, and that was based on having a fairly intimate relationship with their users (at the time, anyway). Another thing to keep in mind in these discussions is the inexpensiveness of servers today. This means that, if slaves have poor performance, that's OK; one can always spin up more slaves. But if each slave imposes a large burden on the master, then that limits your scalability. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |