Prev: [HACKERS] Exposing the Xact commit order to the user
Next: [HACKERS] Synchronization levels in SR
From: Chris Browne on 3 Jun 2010 16:21 gsstark(a)mit.edu (Greg Stark) writes: > On Wed, Jun 2, 2010 at 6:45 PM, Chris Browne <cbbrowne(a)acm.org> wrote: >> It would make it easy to conclude: >> >> � "This next transaction did 8328194 updates. �Maybe we should do >> � some kind of checkpoint (e.g. - commit transaction or such) before >> � working on it." >> >> � �versus >> >> � "This transaction we're thinking of working on had 7 updates. �No >> � big deal..." > > I'm puzzled how you would define this value. How do you add 7 inserts, > 7 deletes, and 7 updates? Is that 21 rows modified? Why are the 7 > inserts and 7 deletes worth twice as much as the 7 updates when > they're basically the same thing? What if the inserts fired triggers > which inserted 7 more rows, is that 14? What if the 7 updates modified > 2 TB of TOAST data but the 8238194 updates were all to the same record > and they were all HOT updates so all it did was change 8kB? The presence of those questions (and their ambiguity) is the reason why there's a little squirming as to whether this is super-useful and super-necessary. What this offers is *SOME* idea of how much updating work a particular transaction did. It's a bit worse than you suggest: - If replication triggers have captured tuples, those would get counted. - TOAST updates might lead to extra updates being counted. But back to where you started, I'd anticipate 7 inserts, 7 deletes, and 7 updates being counted as something around 21 updates. And if that included 5 TOAST changes, it might bump up to 26. If there were replication triggers in place, that might bump the count up to 45 (which I chose arbitrarily). > In any case you'll have all the actual data from your triggers or > hooks or whatever so what value does having the system keep track of > this add? This means that when we'd pull the list of transactions to consider, we'd get something like: select * from next_transactions('4218:23', 50); [list of 50 transactions returned, each with... -> txid -> START timestamp -> COMMIT timestamp -> Approximate # of updates Then, for each of the 50, I'd pull replication log data for the corresponding transaction. If I have the approximate # of updates, that might lead me to stop short, and say: "That next update looks like a doozy! I'm going to stop and commit what I've got before doing that one." It's not strictly necessary, but would surely be useful for flow control. -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://cbbrowne.com/info/internet.html "MS apparently now has a team dedicated to tracking problems with Linux and publicizing them. I guess eventually they'll figure out this back fires... ;)" -- William Burrow <aa126(a)DELETE.fan.nb.ca>
From: Chris Browne on 3 Jun 2010 16:50 bruce(a)momjian.us (Bruce Momjian) writes: > Jan Wieck wrote: >> The point is not that we don't have that information now. The point is >> having a hint BEFORE wading through possibly gigabytes of WAL or log data. >> >> If getting that information requires to read all the log data twice or >> the need to read gigabytes of otherwise useless WAL data (as per Bruce's >> suggestion), we better not get it at all and just keep doing what we are >> doing now. >> >> I actually have a hard time understanding why people are so opposed to a >> feature that has zero impact at all unless a DBA actually turns in ON. >> What is the problem with exposing the commit order of transactions? > > If you want to fork Postgres and add it, go ahead, but if the community > has to maintain the code and document it, we care. Are you "caring" or "opposing"? It seems rather uncharitable to imply that Jan doesn't care. I know *I'm* not interested in a forked Postgres for this - I would prefer to find out what things could be done that don't involve gross amounts of WAL file grovelling for data that mayn't necessarily even be available. -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://cbbrowne.com/info/internet.html "MS apparently now has a team dedicated to tracking problems with Linux and publicizing them. I guess eventually they'll figure out this back fires... ;)" -- William Burrow <aa126(a)DELETE.fan.nb.ca>
From: Jan Wieck on 3 Jun 2010 17:07 On 6/3/2010 4:04 PM, Bruce Momjian wrote: > If you want to fork Postgres and add it, go ahead, but if the community > has to maintain the code and document it, we care. That comment was rather unprofessional. I think the rest of us still try to find the best solution for the problem, not kill the discussion. You may want to rejoin that effort. I care about an efficient, low overhead way to get a certain information, that is otherwise extremely difficult, expensive and version dependent to get. I care about cleaning up more of the mistakes, made in the original development of Slony. Namely using hacks and kluges to implement details, not supported by a current version of PostgreSQL. Londiste and Slony made a good leap on that with the txid data type. Slony made another step like that with 2.0, switching to the (for that very purpose developed and contributed) native trigger configuration instead of hacking system catalogs. This would be another step in that direction and we would be able to unify Londiste's and Slony's transport mechanism and eliminating the tick/sync kluge. Care to explain what exactly you care about? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Stark on 3 Jun 2010 17:58 On Thu, Jun 3, 2010 at 8:50 PM, Jan Wieck <JanWieck(a)yahoo.com> wrote: >> I'm puzzled how you would define this value. How do you add 7 inserts, >> 7 deletes, and 7 updates? Is that 21 rows modified? > > I actually have a hard time understanding why people are so opposed to a > feature that has zero impact at all unless a DBA actually turns in ON. What > is the problem with exposing the commit order of transactions? The post you were responding to was regarding the meaninglessness of the "number of records" attribute you wanted. Your response is a non sequitor. I think the commit order of transactions would be a good thing to expose though I've asked repeatedly what kind of interface you need and never gotten answers to all the questions. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 3 Jun 2010 18:18
Greg Stark <gsstark(a)mit.edu> wrote: > what kind of interface you need For the potential uses I can see, it would be great to have a SRF which took two parameters: xid of last known commit and a limit how many commits past that to return. Perhaps a negative number could move earlier in time, if that seems reasonable to others. I think that's also consistent with Jan's posts. A GUC to enable it and some way to specify retention (or force cleanup) are the only other user-facing features which come to mind for me. (Not sure what form that last should take, but didn't Jan say something about both of these early in the thread?) Do you see a need for something else (besides, obviously, docs)? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |