Exposing the Xact commit order to the user [PgSql]

Prev: [HACKERS] Exposing the Xact commit order to the user
Next: [HACKERS] Synchronization levels in SR

From: Chris Browne on 3 Jun 2010 16:21

gsstark(a)mit.edu (Greg Stark) writes:
> On Wed, Jun 2, 2010 at 6:45 PM, Chris Browne <cbbrowne(a)acm.org> wrote:
>> It would make it easy to conclude:
>>
>> � "This next transaction did 8328194 updates. �Maybe we should do
>> � some kind of checkpoint (e.g. - commit transaction or such) before
>> � working on it."
>>
>> � �versus
>>
>> � "This transaction we're thinking of working on had 7 updates. �No
>> � big deal..."
>
> I'm puzzled how you would define this value. How do you add 7 inserts,
> 7 deletes, and 7 updates? Is that 21 rows modified? Why are the 7
> inserts and 7 deletes worth twice as much as the 7 updates when
> they're basically the same thing? What if the inserts fired triggers
> which inserted 7 more rows, is that 14? What if the 7 updates modified
> 2 TB of TOAST data but the 8238194 updates were all to the same record
> and they were all HOT updates so all it did was change 8kB?

The presence of those questions (and their ambiguity) is the reason
why there's a little squirming as to whether this is super-useful and
super-necessary.

What this offers is *SOME* idea of how much updating work a particular
transaction did. It's a bit worse than you suggest:

- If replication triggers have captured tuples, those would get
counted.

- TOAST updates might lead to extra updates being counted.

But back to where you started, I'd anticipate 7 inserts, 7 deletes,
and 7 updates being counted as something around 21 updates.

And if that included 5 TOAST changes, it might bump up to 26.

If there were replication triggers in place, that might bump the count
up to 45 (which I chose arbitrarily).

> In any case you'll have all the actual data from your triggers or
> hooks or whatever so what value does having the system keep track of
> this add?

This means that when we'd pull the list of transactions to consider,
we'd get something like:

select * from next_transactions('4218:23', 50);

[list of 50 transactions returned, each with...
-> txid
-> START timestamp
-> COMMIT timestamp
-> Approximate # of updates

Then, for each of the 50, I'd pull replication log data for the
corresponding transaction.

If I have the approximate # of updates, that might lead me to stop
short, and say:

"That next update looks like a doozy! I'm going to stop and commit
what I've got before doing that one."

It's not strictly necessary, but would surely be useful for flow
control.
--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
"MS apparently now has a team dedicated to tracking problems with
Linux and publicizing them. I guess eventually they'll figure out
this back fires... ;)" -- William Burrow <aa126(a)DELETE.fan.nb.ca>

From: Chris Browne on 3 Jun 2010 16:50

bruce(a)momjian.us (Bruce Momjian) writes:
> Jan Wieck wrote:
>> The point is not that we don't have that information now. The point is
>> having a hint BEFORE wading through possibly gigabytes of WAL or log data.
>>
>> If getting that information requires to read all the log data twice or
>> the need to read gigabytes of otherwise useless WAL data (as per Bruce's
>> suggestion), we better not get it at all and just keep doing what we are
>> doing now.
>>
>> I actually have a hard time understanding why people are so opposed to a
>> feature that has zero impact at all unless a DBA actually turns in ON.
>> What is the problem with exposing the commit order of transactions?
>
> If you want to fork Postgres and add it, go ahead, but if the community
> has to maintain the code and document it, we care.

Are you "caring" or "opposing"? It seems rather uncharitable to imply
that Jan doesn't care.

I know *I'm* not interested in a forked Postgres for this - I would
prefer to find out what things could be done that don't involve gross
amounts of WAL file grovelling for data that mayn't necessarily even
be available.
--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
"MS apparently now has a team dedicated to tracking problems with
Linux and publicizing them. I guess eventually they'll figure out
this back fires... ;)" -- William Burrow <aa126(a)DELETE.fan.nb.ca>

From: Jan Wieck on 3 Jun 2010 17:07

On 6/3/2010 4:04 PM, Bruce Momjian wrote:
> If you want to fork Postgres and add it, go ahead, but if the community
> has to maintain the code and document it, we care.

That comment was rather unprofessional. I think the rest of us still try
to find the best solution for the problem, not kill the discussion. You
may want to rejoin that effort.

I care about an efficient, low overhead way to get a certain
information, that is otherwise extremely difficult, expensive and
version dependent to get.

I care about cleaning up more of the mistakes, made in the original
development of Slony. Namely using hacks and kluges to implement
details, not supported by a current version of PostgreSQL. Londiste and
Slony made a good leap on that with the txid data type. Slony made
another step like that with 2.0, switching to the (for that very purpose
developed and contributed) native trigger configuration instead of
hacking system catalogs. This would be another step in that direction
and we would be able to unify Londiste's and Slony's transport mechanism
and eliminating the tick/sync kluge.

Care to explain what exactly you care about?

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on 3 Jun 2010 17:58

On Thu, Jun 3, 2010 at 8:50 PM, Jan Wieck <JanWieck(a)yahoo.com> wrote:
>> I'm puzzled how you would define this value. How do you add 7 inserts,
>> 7 deletes, and 7 updates? Is that 21 rows modified?
>
> I actually have a hard time understanding why people are so opposed to a
> feature that has zero impact at all unless a DBA actually turns in ON. What
> is the problem with exposing the commit order of transactions?

The post you were responding to was regarding the meaninglessness of
the "number of records" attribute you wanted. Your response is a non
sequitor.

I think the commit order of transactions would be a good thing to
expose though I've asked repeatedly what kind of interface you need
and never gotten answers to all the questions.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on 3 Jun 2010 18:18

Greg Stark <gsstark(a)mit.edu> wrote:

> what kind of interface you need

For the potential uses I can see, it would be great to have a SRF
which took two parameters: xid of last known commit and a limit how
many commits past that to return. Perhaps a negative number could
move earlier in time, if that seems reasonable to others. I think
that's also consistent with Jan's posts. A GUC to enable it and
some way to specify retention (or force cleanup) are the only other
user-facing features which come to mind for me. (Not sure what form
that last should take, but didn't Jan say something about both of
these early in the thread?)

Do you see a need for something else (besides, obviously, docs)?

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Prev: [HACKERS] Exposing the Xact commit order to the user
Next: [HACKERS] Synchronization levels in SR