Prev: Hot Standby query cancellation and Streaming Replication integration
Next: ProcSignalSlot vs. PGPROC
From: Heikki Linnakangas on 26 Feb 2010 15:04 Tom Lane wrote: > Josh Berkus <josh(a)agliodbs.com> writes: >> On 2/26/10 10:53 AM, Tom Lane wrote: >>> I think that what we are going to have to do before we can ship 9.0 >>> is rip all of that stuff out and replace it with the sort of closed-loop >>> synchronization Greg Smith is pushing. It will probably be several >>> months before everyone is forced to accept that, which is why 9.0 is >>> not going to ship this year. > >> I don't think that publishing visibility info back to the master ... and >> subsequently burdening the master substantially for each additional >> slave ... are what most users want. > > I don't see a "substantial additional burden" there. What I would > imagine is needed is that the slave transmits a single number back > --- its current oldest xmin --- and the walsender process publishes > that number as its transaction xmin in its PGPROC entry on the master. The additional burden comes from the old snapshot effect. It makes it unusable for offloading reporting queries, for example. In general, it is a very good architectural property that the master is not affected by what happens in a standby, and a closed-loop synchronization would break that. I don't actually understand how tight synchronization on its own would solve the problem. What if the connection to the master is lost? Do you kill all queries in the standby before reconnecting? One way to think about this is to first consider a simple a stop-and-go system. Clearly the database must be consistent at any point in the WAL sequence, if recovery was stopped and the database started up. So it is always safe to pause recovery and run a read-only query against the database as it is at that point in time (this assumes that the index "cleanup" operations are not required for consistent query results BTW). After the read-only transaction is finished, you can continue recovery. The next step up is to relax that so that you allow replay of those WAL records that are known to not cause trouble to the read-only queries. For example, heap_insert records are very innocent, they only add rows with a yet-uncommitted xmin. Things get more complex when you allow the replay of commit records; all the known-assigned-xids tracking is related to that, so that transactions that are not committed when a snapshot is taken in the standby to be considered uncommitted by the snapshot even after the commit record is later replayed. If that feels too fragile, there might be other methods to achieve that. One I once pondered is to not track all in-progress transactions in shared memory like we do now, but only OldestXmin. When a backend wants to take a snapshot in the slave, it memcpy()s clog from OldestXmin to the latest committed XID, and includes it in the snapshot. The visibility checks use the copy instead of the actual clog, so they see the situation as it was when the snapshot was taken. To keep track of the OldestXmin in the slave, the master can emit that as a WAL record every now and then; it's ok if it lags behind. Then there's the WAL record types that remove data that might still be required by the read-only transactions. This includes vacuum and index deletion records. If you really think the current approach is unworkable, I'd suggest that we fall back to a stop-and-go system, where you either let the recovery to progress or allow queries to run, but not both at the same time. But FWIW I don't think the situation is that grave. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Josh Berkus on 26 Feb 2010 16:06 > Well, as Heikki said, a stop-and-go WAL management approach could deal > with that use-case. What I'm concerned about here is the complexity, > reliability, maintainability of trying to interlock WAL application with > slave queries in any sort of fine-grained fashion. This sounds a bit brute-force, but what about simply having some form of automatic query retry on the slave? --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 26 Feb 2010 18:33 Tom Lane wrote: > I don't see a "substantial additional burden" there. What I would > imagine is needed is that the slave transmits a single number back > --- its current oldest xmin --- and the walsender process publishes > that number as its transaction xmin in its PGPROC entry on the master. > That is exactly the core idea I was trying to suggest in my rambling message. Just that small additional bit of information transmitted and published to the master via that route, and it's possible to optimize this problem in a way not available now. And it's a way that I believe will feel more natural to some users who may not be well served by any of the existing tuning possibilities. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 26 Feb 2010 18:56 Bruce Momjian wrote: > Doesn't the system already adjust the delay based on the length of slave > transactions, e.g. max_standby_delay. It seems there is no need for a > user switch --- just max_standby_delay really high. > The first issue is that you're basically saying "I don't care about high availability anymore" when you increase max_standby_delay to a high value. Want to offload an 8 hour long batch report every day to the standby? You can do it with max_standby_delay=8 hours. But the day your master crashes 7 hours into that, you're in for a long wait before your standby is available while it replays all the queued up segments. Your 'hot standby' has actually turned into the old form of 'cold standby' just when you need it to be responsive. This is also the reason why the whole "pause recovery" idea is a fruitless path to wander down. The whole point of this feature is that people have a secondary server available for high-availability, *first and foremost*, but they'd like it to do something more interesting that leave it idle all the time. The idea that you can hold off on applying standby updates for long enough to run seriously long reports is completely at odds with the idea of high-availability. The second major problem is that the day the report actually takes 8.1 hours instead, because somebody else ran another report that slowed you down a little, you're screwed if that's something you depend on being available--it just got canceled only *after* wasting 8 hours of reporting resource time. max_standby_delay is IMHO only useful for allowing non-real-time web-app style uses of HS (think "Facebook status updates"), where you say "I'm OK giving people slightly out of date info sometimes if it lets me split the query load over two systems". Set max_standby_delay to a few seconds or maybe a minute, enough time to service a typical user query, make your app tolerate the occasional failed query and only send big ones to the master, and you've just scaled up all the small ones. Distributed queries with "eventual consistency" on all nodes is where many of the web app designs are going, and this feature is a reasonable match for that use case. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 26 Feb 2010 19:11
Bruce Momjian wrote: > 5 Early cleanup of data still visible to the current query's > snapshot > > #5 could be handled by using vacuum_defer_cleanup_age on the master. > Why is vacuum_defer_cleanup_age not listed in postgresql.conf? > I noticed that myself and fired off a corrective patch to Simon yesterday, he said it was intentional but not sure why that is yet. We'll sort that out. You are correct that my suggestion is targeting primarily #5 on this list. There are two problems with the possible solutions using that parameter though: -vacuum_defer_cleanup_age is set in a unit that people cannot be expected to work in--transactions ids. The UI is essentially useless, and there's no obvious way how to make a better one. The best you can do will still be really fragile. -If you increase vacuum_defer_cleanup_age, it's active all the time. You're basically making every single transaction that could be cleaned up pay for the fact that a query *might* be running on the standby it needs to avoid. You can think of the idea of passing an xmin back from the standby as being like an auto-tuning vacuum_defer_cleanup_age. It's 0 when no standby queries are running, but grows in size to match longer ones. And you don't have to have to know anything to set it correctly; just toggle on the proposed "feedback xid from the standby" feature and you're safe. Expecting that anyone will ever set vacuum_defer_cleanup_age correctly in the field in its current form is pretty unreasonable I think. Since there's no timestamp-based memory of past xid activity, it's difficult to convert it to that form instead, and I think something in terms of time is what people would like to set this in. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |