From: Robert Haas on
On Sun, May 23, 2010 at 9:44 PM, Jan Wieck <JanWieck(a)yahoo.com> wrote:
> I'm not sure the retention policies of the shared buffer cache, the WAL
> buffers, CLOG buffers and every other thing we try to cache are that easy to
> fold into one single set of logic. But I'm all ears.

I'm not sure either, although it seems like LRU ought to be good
enough for most things. I'm more worried about things like whether
the BufferDesc abstraction is going to get in the way.

>>> CommitTransaction() inside of xact.c will call a function, that inserts
>>> a new record into this array. The operation will for most of the time be
>>> nothing than taking a spinlock and adding the record to shared memory.
>>> All the data for the record is readily available, does not require
>>> further locking and can be collected locally before taking the spinlock.
>>
>> What happens when you need to switch pages?
>
> Then the code will have to grab another free buffer or evict one.

Hopefully not while holding a spin lock. :-)

>>> The function will return the "sequence" number which CommitTransaction()
>>> in turn will record in the WAL commit record together with the
>>> begin_timestamp. While both, the begin as well as the commit timestamp
>>> are crucial to determine what data a particular transaction should have
>>> seen, the row count is not and will not be recorded in WAL.
>>
>> It would certainly be better if we didn't to bloat the commit xlog
>> records to do this.  Is there any way to avoid that?
>
> If you can tell me how a crash recovering system can figure out what the
> exact "sequence" number of the WAL commit record at hand should be, let's
> rip it.

Hmm... could we get away with WAL-logging the next sequence number
just once per checkpoint? When you replay the checkpoint record, you
update the control file with the sequence number. Then all the
commits up through the next checkpoint just use consecutive numbers
starting at that value.

> It is an option. "Keep it until I tell you" is a perfectly valid
> configuration option. One you probably don't want to forget about, but valid
> none the less.

As Tom is fond of saying, if it breaks, you get to keep both pieces.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on
Jan Wieck wrote:

> In some systems (data warehousing, replication), the order of
> commits is important, since that is the order in which changes
> have become visible.

This issue intersects with the serializable work I've been doing.
While in database transactions using S2PL the above is true, in
snapshot isolation and the SSI implementation of serializable
transactions, it's not. In particular, the snapshot anomalies which
can cause non-serializable behavior happen precisely because the
apparent order of execution doesn't match anything so linear as
order of commit.

I'll raise that receipting example again. You have transactions
which grab the current deposit data and insert it into receipts, as
payments are received. At some point in the afternoon, the deposit
date in a control table is changed to the next day, so that the
receipts up to that point can be deposited during banking hours with
the current date as their deposit date. A report is printed (and
likely a transfer transaction recorded to move "cash in drawer" to
"cash in checking", but I'll ignore that aspect for this example).
Some receipts may not be committed when the update to the date in
the control table is committed.

This is "eventually consistent" -- once all the receipts with the
old date commit or roll back the database is OK, but until then you
might be able to select the new date in the control table and the
set of receipts matching the old date without the database telling
you that you're missing data. The new serializable implementation
fixes this, but there are open R&D items (due to the need to discuss
the issues) on the related Wiki page related to hot standby and
other replication. Will we be able to support transactional
integrity on slave machines?

What if the update to the control table and the insert of receipts
all happen on the master, but someone decides to move the (now
happily working correctly with serializable transactions) reporting
to a slave machine? (And by the way, don't get too hung up on this
particular example, I could generate dozens more on demand -- the
point is that order of commit doesn't always correspond to apparent
order of execution; in this case the receipts *appear* to have
executed first, because they are using a value "later" updated to
something else by a different transaction, even though that other
transaction *committed* first.)

Replicating or recreating the whole predicate locking and conflict
detection on slaves is not feasible for performance reasons. (I
won't elaborate unless someone feels that's not intuitively
obvious.) The only sane way I can see to have a slave database allow
serializable behavior is to WAL-log the acquisition of a snapshot by
a serializable transaction, and the rollback or commit, on the
master, and to have the serializable snapshot build on a slave
exclude any serializable transactions for which there are still
concurrent serializable transactions. Yes, that does mean WAL-
logging the snapshot acquisition even if the transaction doesn't yet
have an xid, and WAL-logging the commit or rollback even if it never
acquires an xid.

I think this solve the issue Jan raises as long as serializable
transactions are used; if they aren't there are no guarantees of
transactional integrity no matter how you track commit sequence,
unless it can be based on S2PL-type blocking locks. I'll have to
leave that to someone else to sort out.

-Kevin


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Mon, May 24, 2010 at 11:24 AM, Kevin Grittner
<Kevin.Grittner(a)wicourts.gov> wrote:
> Jan Wieck wrote:
>
>> In some systems (data warehousing, replication), the order of
>> commits is important, since that is the order in which changes
>> have become visible.
>
> This issue intersects with the serializable work I've been doing.
> While in database transactions using S2PL the above is true, in
> snapshot isolation and the SSI implementation of serializable
> transactions, it's not.

I think you're confusing two subtly different things. The way to
prove that a set of transactions running under some implementation of
serializability is actually serializable is to construct a serial
order of execution consistent with the view of the database that each
transaction saw. This may or may not match the commit order, as you
say. But the commit order is still the order the effects of those
transactions have become visible - if we inserted a new read-only
transaction into the stream at some arbitrary point in time, it would
see all the transactions which committed before it and none of those
that committed afterward. So I think Jan's statement is correct.

Having said that, I think your concerns about how things will look
from a slave's point of view are possibly valid. A transaction
running on a slave is essentially a read-only transaction that the
master doesn't know about. It's not clear to me whether adding such a
transaction to the timeline could result in either (a) that
transaction being rolled back or (b) some impact on which other
transactions got rolled back. If it did, that would obviously be a
problem for serializability on slaves, though your proposed fix sounds
like it would be prohibitively expensive for many users. But can this
actually happen?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on
Robert Haas wrote:

> I think you're confusing two subtly different things.

The only thing I'm confused about is what benefit anyone expects to
get from looking at data between commits in some way other than our
current snapshot mechanism. Can someone explain a use case where
what Jan is proposing is better than snapshot isolation? It doesn't
provide any additional integrity guarantees that I can see.

> But the commit order is still the order the effects of those
> transactions have become visible - if we inserted a new read-only
> transaction into the stream at some arbitrary point in time, it
> would see all the transactions which committed before it and none
> of those that committed afterward.

Isn't that what a snapshot does already?

> your proposed fix sounds like it would be prohibitively expensive
> for many users. But can this actually happen?

How so? The transaction start/end logging, or looking at that data
when building a snapshot?

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on
On 24/05/10 19:51, Kevin Grittner wrote:
> The only thing I'm confused about is what benefit anyone expects to
> get from looking at data between commits in some way other than our
> current snapshot mechanism. Can someone explain a use case where
> what Jan is proposing is better than snapshot isolation? It doesn't
> provide any additional integrity guarantees that I can see.

Right, it doesn't. What it provides is a way to reconstruct a snapshot
at any point in time, after the fact. For example, after transactions A,
C, D and B have committed in that order, it allows you to reconstruct a
snapshot just like you would've gotten immediately after the commit of
A, C, D and B respectively. That's useful replication tools like Slony
that needs to commit the changes of those transactions in the slave in
the same order as they were committed in the master.

I don't know enough of Slony et al. to understand why that'd be better
than the current heartbeat mechanism they use, taking a snapshot every
few seconds, batching commits.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers