Exposing the Xact commit order to the user [PgSql]

Prev: [HACKERS] Exposing the Xact commit order to the user
Next: [HACKERS] Synchronization levels in SR

From: "Greg Sabino Mullane" on 23 May 2010 16:48

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

> Exposing the data will be done via a set returning function. The SRF
> takes two arguments. The maximum number of rows to return and the last
> serial number processed by the reader. The advantage of such SRF is that
> the result can be used in a query that right away delivers audit or
> replication log information in transaction commit order. The SRF can
> return an empty set if no further transactions have committed since, or
> an error if data segments needed to answer the request have already been
> purged.

In light of the proposed purging scheme, how would it be able to distinguish
between those two cases (nothing there yet vs. was there but purged)?

- --
Greg Sabino Mullane greg(a)turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201005231646
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkv5lIAACgkQvJuQZxSWSsiR3gCgvyK/NPd6WmKGUqdo/3fdWIR7
LAQAoJqk3gYpEgtjw10gINDKFXTAnWO5
=sSvK
-----END PGP SIGNATURE-----

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 23 May 2010 20:38

On Sun, May 23, 2010 at 4:21 PM, Jan Wieck <JanWieck(a)yahoo.com> wrote:
> The system will have postgresql.conf options for enabling/disabling the
> whole shebang, how many shared buffers to allocate for managing access
> to the data and to define the retention period of the data based on data
> volume and/or age of the commit records.

It would be nice if this could just be managed out of shared_buffers
rather than needing to configure a separate pool just for this
feature. But, I'm not sure how much work that is, and if it turns out
to be too ugly then I'd say it's not a hard requirement. In general,
I think we talked during the meeting about the desirability of folding
specific pools into shared_buffers rather than managing them
separately, but I'm not aware that we have any cases where we do that
today so it might be hard (or not).

> Each record of the Transaction Commit Info consists of
>
> txid xci_transaction_id
> timestamptz xci_begin_timestamp
> timestamptz xci_commit_timestamp
> int64 xci_total_rowcount
>
> 32 bytes total.

Are we sure it's worth including the row count? I wonder if we ought
to leave that out and let individual clients of the mechanism track
that if they're so inclined, especially since it won't be reliable
anyway.

> CommitTransaction() inside of xact.c will call a function, that inserts
> a new record into this array. The operation will for most of the time be
> nothing than taking a spinlock and adding the record to shared memory.
> All the data for the record is readily available, does not require
> further locking and can be collected locally before taking the spinlock.

What happens when you need to switch pages?

> The function will return the "sequence" number which CommitTransaction()
> in turn will record in the WAL commit record together with the
> begin_timestamp. While both, the begin as well as the commit timestamp
> are crucial to determine what data a particular transaction should have
> seen, the row count is not and will not be recorded in WAL.

It would certainly be better if we didn't to bloat the commit xlog
records to do this. Is there any way to avoid that?

> Checkpoint handling will call a function to flush the shared buffers.
> Together with this, the information from WAL records will be sufficient
> to recover this data (except for row counts) during crash recovery.

Right.

> Exposing the data will be done via a set returning function. The SRF
> takes two arguments. The maximum number of rows to return and the last
> serial number processed by the reader. The advantage of such SRF is that
> the result can be used in a query that right away delivers audit or
> replication log information in transaction commit order. The SRF can
> return an empty set if no further transactions have committed since, or
> an error if data segments needed to answer the request have already been
> purged.
>
> Purging of the data will be possible in several different ways.
> Autovacuum will call a function that drops segments of the data that are
> outside the postgresql.conf configuration with respect to maximum age
> or data volume. There will also be a function reserved for superusers to
> explicitly purge the data up to a certain serial number.

Dunno if autovacuuming this is the right way to go. Seems like that
could leave to replication breaks, and it's also more work than not
doing that. I'd just say that if you turn this on you're responsible
for pruning it, full stop.

> Anyone who trades liberty for security deserves neither
> liberty nor security. -- Benjamin Franklin

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jan Wieck on 23 May 2010 21:18

On 5/23/2010 4:48 PM, Greg Sabino Mullane wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
>
>> Exposing the data will be done via a set returning function. The SRF
>> takes two arguments. The maximum number of rows to return and the last
>> serial number processed by the reader. The advantage of such SRF is that
>> the result can be used in a query that right away delivers audit or
>> replication log information in transaction commit order. The SRF can
>> return an empty set if no further transactions have committed since, or
>> an error if data segments needed to answer the request have already been
>> purged.
>
> In light of the proposed purging scheme, how would it be able to distinguish
> between those two cases (nothing there yet vs. was there but purged)?

There is a difference between an empty result set and an exception.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jan Wieck on 23 May 2010 21:44

On 5/23/2010 8:38 PM, Robert Haas wrote:
> On Sun, May 23, 2010 at 4:21 PM, Jan Wieck <JanWieck(a)yahoo.com> wrote:
>> The system will have postgresql.conf options for enabling/disabling the
>> whole shebang, how many shared buffers to allocate for managing access
>> to the data and to define the retention period of the data based on data
>> volume and/or age of the commit records.
>
> It would be nice if this could just be managed out of shared_buffers
> rather than needing to configure a separate pool just for this
> feature. But, I'm not sure how much work that is, and if it turns out
> to be too ugly then I'd say it's not a hard requirement. In general,
> I think we talked during the meeting about the desirability of folding
> specific pools into shared_buffers rather than managing them
> separately, but I'm not aware that we have any cases where we do that
> today so it might be hard (or not).

I'm not sure the retention policies of the shared buffer cache, the WAL
buffers, CLOG buffers and every other thing we try to cache are that
easy to fold into one single set of logic. But I'm all ears.

>
>> Each record of the Transaction Commit Info consists of
>>
>> txid xci_transaction_id
>> timestamptz xci_begin_timestamp
>> timestamptz xci_commit_timestamp
>> int64 xci_total_rowcount
>>
>> 32 bytes total.
>
> Are we sure it's worth including the row count? I wonder if we ought
> to leave that out and let individual clients of the mechanism track
> that if they're so inclined, especially since it won't be reliable
> anyway.

Nope, we (my belly and I) are not sure about the absolute worth of the
row count. It would be a convenient number to have there, but I can live
without it.

>
>> CommitTransaction() inside of xact.c will call a function, that inserts
>> a new record into this array. The operation will for most of the time be
>> nothing than taking a spinlock and adding the record to shared memory.
>> All the data for the record is readily available, does not require
>> further locking and can be collected locally before taking the spinlock.
>
> What happens when you need to switch pages?

Then the code will have to grab another free buffer or evict one.

>
>> The function will return the "sequence" number which CommitTransaction()
>> in turn will record in the WAL commit record together with the
>> begin_timestamp. While both, the begin as well as the commit timestamp
>> are crucial to determine what data a particular transaction should have
>> seen, the row count is not and will not be recorded in WAL.
>
> It would certainly be better if we didn't to bloat the commit xlog
> records to do this. Is there any way to avoid that?

If you can tell me how a crash recovering system can figure out what the
exact "sequence" number of the WAL commit record at hand should be,
let's rip it.

>
>> Checkpoint handling will call a function to flush the shared buffers.
>> Together with this, the information from WAL records will be sufficient
>> to recover this data (except for row counts) during crash recovery.
>
> Right.
>
>> Exposing the data will be done via a set returning function. The SRF
>> takes two arguments. The maximum number of rows to return and the last
>> serial number processed by the reader. The advantage of such SRF is that
>> the result can be used in a query that right away delivers audit or
>> replication log information in transaction commit order. The SRF can
>> return an empty set if no further transactions have committed since, or
>> an error if data segments needed to answer the request have already been
>> purged.
>>
>> Purging of the data will be possible in several different ways.
>> Autovacuum will call a function that drops segments of the data that are
>> outside the postgresql.conf configuration with respect to maximum age
>> or data volume. There will also be a function reserved for superusers to
>> explicitly purge the data up to a certain serial number.
>
> Dunno if autovacuuming this is the right way to go. Seems like that
> could leave to replication breaks, and it's also more work than not
> doing that. I'd just say that if you turn this on you're responsible
> for pruning it, full stop.

It is an option. "Keep it until I tell you" is a perfectly valid
configuration option. One you probably don't want to forget about, but
valid none the less.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Greg Sabino Mullane" on 24 May 2010 09:30

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

> In light of the proposed purging scheme, how would it be able to distinguish
> between those two cases (nothing there yet vs. was there but purged)?

> There is a difference between an empty result set and an exception.

No, I meant how will the *function* know, if a superuser and/or some
background process can purge records at any time?

- --
Greg Sabino Mullane greg(a)turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201005240928
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkv6f0UACgkQvJuQZxSWSsh0xwCgmXLtKngoBBYX0TxDM2TlJRId
AVIAoMHYa3c9Ej2vUJyFufxBR5vDPzQ+
=e1mh
-----END PGP SIGNATURE-----

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: [HACKERS] Exposing the Xact commit order to the user
Next: [HACKERS] Synchronization levels in SR