Prev: Hot Standby query cancellation and Streaming Replication integration
Next: ProcSignalSlot vs. PGPROC
From: Josh Berkus on 26 Feb 2010 19:21 > That is exactly the core idea I was trying to suggest in my rambling > message. Just that small additional bit of information transmitted and > published to the master via that route, and it's possible to optimize > this problem in a way not available now. And it's a way that I believe > will feel more natural to some users who may not be well served by any > of the existing tuning possibilities. Well, if both you and Tom think it would be relatively easy (or at least easier that continuing to pursue query cancel troubleshooting), then please start coding it. It was always a possible approach, we just collectively thought that query cancel would be easier. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 26 Feb 2010 21:43 Bruce Momjian wrote: > Well, I think the choice is either you delay vacuum on the master for 8 > hours or pile up 8 hours of WAL files on the slave, and delay > application, and make recovery much slower. It is not clear to me which > option a user would prefer because the bloat on the master might be > permanent. > But if you're running the 8 hour report on the master right now, aren't you already exposed to a similar pile of bloat issues while it's going? If I have the choice between "sometimes queries will get canceled" vs. "sometimes the master will experience the same long-running transaction bloat issues as in earlier versions even if the query runs on the standby", I feel like leaning toward the latter at least leads to a problem people are used to. This falls into the principle of least astonishment category to me. Testing the final design for how transactions get canceled here led me to some really unexpected situations, and the downside for a mistake is "your query is lost". Had I instead discovered that sometimes long-running transactions on the standby can ripple back to cause a maintenance slowdown on the master, that's not great. But it would not have been so surprising, and it won't result in lost query results. I think people will expect that their queries cancel because of things like DDL changes. And the existing knobs allow inserting some slack for things like locks taking a little bit of time to acquire sometimes. What I don't think people will see coming is that a routine update on an unrelated table is going to kill a query they might have been waiting hours for the result of, just because that update crossed an autovacuum threshold for the other table and introduced a dead row cleanup. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 26 Feb 2010 21:52 Heikki Linnakangas wrote: > One such landmine is that the keepalives need to flow from client to > server while the WAL records are flowing from server to client. We'll > have to crack that problem for synchronous replication too, but I think > that alone is a big enough problem to make this 9.1 material. > This seems to be the real sticking point then, given that the xmin/PGPROC side on the master seems logically straightforward. For some reason I thought the sync rep feature had the reverse message flow already going, and that some other sort of limitation just made it impractical to merge into the main codebase this early. My hope was that just this particular part could get cherry-picked out of there, and that it might even have been thought about already in that context given the known HS keepalive "serious issue". If there was a solution or partial solution in progress to that floating around, my thought was that just piggybacking this extra xid info on top of it would be easy enough. If there's not already a standby to primary communications backchannel implementation available that can be harvested from that work, your suggestion that this may not be feasible at all for 9.0 seems like a more serious concern than I had thought it was going to be. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Joshua D. Drake" on 26 Feb 2010 22:59 On Sat, 27 Feb 2010 00:43:48 +0000, Greg Stark <gsstark(a)mit.edu> wrote: > On Fri, Feb 26, 2010 at 11:56 PM, Greg Smith <greg(a)2ndquadrant.com> wrote: >> This is also the reason why the whole "pause recovery" idea is a >> fruitless >> path to wander down. The whole point of this feature is that people >> have a >> secondary server available for high-availability, *first and foremost*, >> but >> they'd like it to do something more interesting that leave it idle all >> the >> time. The idea that you can hold off on applying standby updates for >> long >> enough to run seriously long reports is completely at odds with the idea >> of >> high-availability. > I want my ability to run large batch queries without any performance > or reliability impact on the primary server. +1 I can use any number of other technologies for high availability. Joshua D. Drake -- PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 26 Feb 2010 23:31
Joshua D. Drake wrote: > On Sat, 27 Feb 2010 00:43:48 +0000, Greg Stark <gsstark(a)mit.edu> wrote: > >> I want my ability to run large batch queries without any performance >> or reliability impact on the primary server. >> > > +1 > > I can use any number of other technologies for high availability. > Remove "must be an instant-on failover at the same time" from the requirements and you don't even need 9.0 to handle that, this has been a straightforward to solve problem since 8.2. It's the combination of HA and queries that make things hard to do. If you just want batch queries on another system without being concerned about HA at the same time, the first option is to just fork the base backup and WAL segment delivery to another server and run queries there. Some simple filesystem snapshot techniques will also suffice to handle it all on the same standby. Stop warm standby recovery, snapshot, trigger the server, run your batch job; once finished, rollback to the snapshot, grab the latest segment files, and resume standby catchup. Even the lame Linux LVM snapshot features can handle that job--one of my coworkers has the whole thing scripted even this is so common. And if you have to go live because there's a failover, you're back to the same "cold standby" situation a large max_standby_delay puts you at, so it's not even very different from what you're going to get in 9.0 if this is your priority mix. The new version is just lowering the operational complexity involved. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |