Exposing the Xact commit order to the user [PgSql]

Prev: [HACKERS] Exposing the Xact commit order to the user
Next: [HACKERS] Synchronization levels in SR

From: Chris Browne on 2 Jun 2010 14:44

heikki.linnakangas(a)enterprisedb.com (Heikki Linnakangas) writes:
> On 24/05/10 19:51, Kevin Grittner wrote:
>> The only thing I'm confused about is what benefit anyone expects to
>> get from looking at data between commits in some way other than our
>> current snapshot mechanism. Can someone explain a use case where
>> what Jan is proposing is better than snapshot isolation? It doesn't
>> provide any additional integrity guarantees that I can see.
>
> Right, it doesn't. What it provides is a way to reconstruct a snapshot
> at any point in time, after the fact. For example, after transactions
> A, C, D and B have committed in that order, it allows you to
> reconstruct a snapshot just like you would've gotten immediately after
> the commit of A, C, D and B respectively. That's useful replication
> tools like Slony that needs to commit the changes of those
> transactions in the slave in the same order as they were committed in
> the master.
>
> I don't know enough of Slony et al. to understand why that'd be better
> than the current heartbeat mechanism they use, taking a snapshot every
> few seconds, batching commits.

I see two advantages:

a) Identifying things on a transaction-by-transaction basis means that
the snapshots ("syncs") don't need to be captured, which is
presently an area of fragility. If the slon daemon falls over on
Friday evening, and nobody notices until Monday, the "snapshot"
reverts to being all updates between Friday and whenever SYNCs
start to be collected again.

Exposing commit orders eliminates that fragility. SYNCs don't
need to be captured anymore, so they can't be missed (which is
today's problem).

b) The sequence currently used to control log application ordering is
a bottleneck, as it is a single sequence shared across all
connections.

It could be eliminated in favor of (perhaps) an in-memory variable
defined on a per-connection basis.

It's not a bottleneck that we hear a lot of complaints about, but
the sequence certainly is a bottleneck.

--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
"MS apparently now has a team dedicated to tracking problems with
Linux and publicizing them. I guess eventually they'll figure out
this back fires... ;)" -- William Burrow <aa126(a)DELETE.fan.nb.ca>

From: Greg Stark on 2 Jun 2010 19:49

On Wed, Jun 2, 2010 at 6:45 PM, Chris Browne <cbbrowne(a)acm.org> wrote:
> It would make it easy to conclude:
>
> "This next transaction did 8328194 updates. Maybe we should do
> some kind of checkpoint (e.g. - commit transaction or such) before
> working on it."
>
> versus
>
> "This transaction we're thinking of working on had 7 updates. No
> big deal..."

I'm puzzled how you would define this value. How do you add 7 inserts,
7 deletes, and 7 updates? Is that 21 rows modified? Why are the 7
inserts and 7 deletes worth twice as much as the 7 updates when
they're basically the same thing? What if the inserts fired triggers
which inserted 7 more rows, is that 14? What if the 7 updates modified
2 TB of TOAST data but the 8238194 updates were all to the same record
and they were all HOT updates so all it did was change 8kB?

In any case you'll have all the actual data from your triggers or
hooks or whatever so what value does having the system keep track of
this add?

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jan Wieck on 3 Jun 2010 15:50

On 6/2/2010 7:49 PM, Greg Stark wrote:
> On Wed, Jun 2, 2010 at 6:45 PM, Chris Browne <cbbrowne(a)acm.org> wrote:
>> It would make it easy to conclude:
>>
>> "This next transaction did 8328194 updates. Maybe we should do
>> some kind of checkpoint (e.g. - commit transaction or such) before
>> working on it."
>>
>> versus
>>
>> "This transaction we're thinking of working on had 7 updates. No
>> big deal..."
>
> I'm puzzled how you would define this value. How do you add 7 inserts,
> 7 deletes, and 7 updates? Is that 21 rows modified? Why are the 7
> inserts and 7 deletes worth twice as much as the 7 updates when
> they're basically the same thing? What if the inserts fired triggers
> which inserted 7 more rows, is that 14? What if the 7 updates modified
> 2 TB of TOAST data but the 8238194 updates were all to the same record
> and they were all HOT updates so all it did was change 8kB?
>
> In any case you'll have all the actual data from your triggers or
> hooks or whatever so what value does having the system keep track of
> this add?

The point is not that we don't have that information now. The point is
having a hint BEFORE wading through possibly gigabytes of WAL or log data.

If getting that information requires to read all the log data twice or
the need to read gigabytes of otherwise useless WAL data (as per Bruce's
suggestion), we better not get it at all and just keep doing what we are
doing now.

I actually have a hard time understanding why people are so opposed to a
feature that has zero impact at all unless a DBA actually turns in ON.
What is the problem with exposing the commit order of transactions?

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Bruce Momjian on 3 Jun 2010 16:04

Jan Wieck wrote:
> The point is not that we don't have that information now. The point is
> having a hint BEFORE wading through possibly gigabytes of WAL or log data.
>
> If getting that information requires to read all the log data twice or
> the need to read gigabytes of otherwise useless WAL data (as per Bruce's
> suggestion), we better not get it at all and just keep doing what we are
> doing now.
>
> I actually have a hard time understanding why people are so opposed to a
> feature that has zero impact at all unless a DBA actually turns in ON.
> What is the problem with exposing the commit order of transactions?

If you want to fork Postgres and add it, go ahead, but if the community
has to maintain the code and document it, we care.

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on 3 Jun 2010 16:11

Jan Wieck <JanWieck(a)Yahoo.com> wrote:

> I actually have a hard time understanding why people are so
> opposed to a feature that has zero impact at all unless a DBA
> actually turns in ON. What is the problem with exposing the
> commit order of transactions?

FWIW, once I came to understand the use case, it seems to me a
perfectly reasonable and useful thing to have. It does strike me
that there may be value to add one more xid to support certain
types of integrity for some use cases, but that's certainly
something which could be added later, if at all. Once I realized
that, I just dropped out of the discussion; perhaps I should have
bowed out with an endorsement.

Unless my memory is failing me worse than usual, Dan Ports, who is
working on the serializable implementation so he can use the
predicate locking with a transaction-aware caching feature, needs
the ability to track commit order of transactions by xid; so the use
cases go beyond Slony and Londiste.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Prev: [HACKERS] Exposing the Xact commit order to the user
Next: [HACKERS] Synchronization levels in SR