Prev: [HACKERS] Exposing the Xact commit order to the user
Next: [HACKERS] Synchronization levels in SR
From: Robert Haas on 24 May 2010 09:37 On Sun, May 23, 2010 at 9:44 PM, Jan Wieck <JanWieck(a)yahoo.com> wrote: > I'm not sure the retention policies of the shared buffer cache, the WAL > buffers, CLOG buffers and every other thing we try to cache are that easy to > fold into one single set of logic. But I'm all ears. I'm not sure either, although it seems like LRU ought to be good enough for most things. I'm more worried about things like whether the BufferDesc abstraction is going to get in the way. >>> CommitTransaction() inside of xact.c will call a function, that inserts >>> a new record into this array. The operation will for most of the time be >>> nothing than taking a spinlock and adding the record to shared memory. >>> All the data for the record is readily available, does not require >>> further locking and can be collected locally before taking the spinlock. >> >> What happens when you need to switch pages? > > Then the code will have to grab another free buffer or evict one. Hopefully not while holding a spin lock. :-) >>> The function will return the "sequence" number which CommitTransaction() >>> in turn will record in the WAL commit record together with the >>> begin_timestamp. While both, the begin as well as the commit timestamp >>> are crucial to determine what data a particular transaction should have >>> seen, the row count is not and will not be recorded in WAL. >> >> It would certainly be better if we didn't to bloat the commit xlog >> records to do this. Is there any way to avoid that? > > If you can tell me how a crash recovering system can figure out what the > exact "sequence" number of the WAL commit record at hand should be, let's > rip it. Hmm... could we get away with WAL-logging the next sequence number just once per checkpoint? When you replay the checkpoint record, you update the control file with the sequence number. Then all the commits up through the next checkpoint just use consecutive numbers starting at that value. > It is an option. "Keep it until I tell you" is a perfectly valid > configuration option. One you probably don't want to forget about, but valid > none the less. As Tom is fond of saying, if it breaks, you get to keep both pieces. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 24 May 2010 11:24 Jan Wieck wrote: > In some systems (data warehousing, replication), the order of > commits is important, since that is the order in which changes > have become visible. This issue intersects with the serializable work I've been doing. While in database transactions using S2PL the above is true, in snapshot isolation and the SSI implementation of serializable transactions, it's not. In particular, the snapshot anomalies which can cause non-serializable behavior happen precisely because the apparent order of execution doesn't match anything so linear as order of commit. I'll raise that receipting example again. You have transactions which grab the current deposit data and insert it into receipts, as payments are received. At some point in the afternoon, the deposit date in a control table is changed to the next day, so that the receipts up to that point can be deposited during banking hours with the current date as their deposit date. A report is printed (and likely a transfer transaction recorded to move "cash in drawer" to "cash in checking", but I'll ignore that aspect for this example). Some receipts may not be committed when the update to the date in the control table is committed. This is "eventually consistent" -- once all the receipts with the old date commit or roll back the database is OK, but until then you might be able to select the new date in the control table and the set of receipts matching the old date without the database telling you that you're missing data. The new serializable implementation fixes this, but there are open R&D items (due to the need to discuss the issues) on the related Wiki page related to hot standby and other replication. Will we be able to support transactional integrity on slave machines? What if the update to the control table and the insert of receipts all happen on the master, but someone decides to move the (now happily working correctly with serializable transactions) reporting to a slave machine? (And by the way, don't get too hung up on this particular example, I could generate dozens more on demand -- the point is that order of commit doesn't always correspond to apparent order of execution; in this case the receipts *appear* to have executed first, because they are using a value "later" updated to something else by a different transaction, even though that other transaction *committed* first.) Replicating or recreating the whole predicate locking and conflict detection on slaves is not feasible for performance reasons. (I won't elaborate unless someone feels that's not intuitively obvious.) The only sane way I can see to have a slave database allow serializable behavior is to WAL-log the acquisition of a snapshot by a serializable transaction, and the rollback or commit, on the master, and to have the serializable snapshot build on a slave exclude any serializable transactions for which there are still concurrent serializable transactions. Yes, that does mean WAL- logging the snapshot acquisition even if the transaction doesn't yet have an xid, and WAL-logging the commit or rollback even if it never acquires an xid. I think this solve the issue Jan raises as long as serializable transactions are used; if they aren't there are no guarantees of transactional integrity no matter how you track commit sequence, unless it can be based on S2PL-type blocking locks. I'll have to leave that to someone else to sort out. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 24 May 2010 11:43 On Mon, May 24, 2010 at 11:24 AM, Kevin Grittner <Kevin.Grittner(a)wicourts.gov> wrote: > Jan Wieck wrote: > >> In some systems (data warehousing, replication), the order of >> commits is important, since that is the order in which changes >> have become visible. > > This issue intersects with the serializable work I've been doing. > While in database transactions using S2PL the above is true, in > snapshot isolation and the SSI implementation of serializable > transactions, it's not. I think you're confusing two subtly different things. The way to prove that a set of transactions running under some implementation of serializability is actually serializable is to construct a serial order of execution consistent with the view of the database that each transaction saw. This may or may not match the commit order, as you say. But the commit order is still the order the effects of those transactions have become visible - if we inserted a new read-only transaction into the stream at some arbitrary point in time, it would see all the transactions which committed before it and none of those that committed afterward. So I think Jan's statement is correct. Having said that, I think your concerns about how things will look from a slave's point of view are possibly valid. A transaction running on a slave is essentially a read-only transaction that the master doesn't know about. It's not clear to me whether adding such a transaction to the timeline could result in either (a) that transaction being rolled back or (b) some impact on which other transactions got rolled back. If it did, that would obviously be a problem for serializability on slaves, though your proposed fix sounds like it would be prohibitively expensive for many users. But can this actually happen? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 24 May 2010 12:51 Robert Haas wrote: > I think you're confusing two subtly different things. The only thing I'm confused about is what benefit anyone expects to get from looking at data between commits in some way other than our current snapshot mechanism. Can someone explain a use case where what Jan is proposing is better than snapshot isolation? It doesn't provide any additional integrity guarantees that I can see. > But the commit order is still the order the effects of those > transactions have become visible - if we inserted a new read-only > transaction into the stream at some arbitrary point in time, it > would see all the transactions which committed before it and none > of those that committed afterward. Isn't that what a snapshot does already? > your proposed fix sounds like it would be prohibitively expensive > for many users. But can this actually happen? How so? The transaction start/end logging, or looking at that data when building a snapshot? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 24 May 2010 13:11
On 24/05/10 19:51, Kevin Grittner wrote: > The only thing I'm confused about is what benefit anyone expects to > get from looking at data between commits in some way other than our > current snapshot mechanism. Can someone explain a use case where > what Jan is proposing is better than snapshot isolation? It doesn't > provide any additional integrity guarantees that I can see. Right, it doesn't. What it provides is a way to reconstruct a snapshot at any point in time, after the fact. For example, after transactions A, C, D and B have committed in that order, it allows you to reconstruct a snapshot just like you would've gotten immediately after the commit of A, C, D and B respectively. That's useful replication tools like Slony that needs to commit the changes of those transactions in the slave in the same order as they were committed in the master. I don't know enough of Slony et al. to understand why that'd be better than the current heartbeat mechanism they use, taking a snapshot every few seconds, batching commits. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |