Prev: [HACKERS] test_fsync output improvement
Next: [HACKERS] proof concept: do statement parametrization
From: Tom Lane on 4 Jul 2010 00:11 Robert Haas <robertmhaas(a)gmail.com> writes: > CREATE OR REPLACE FUNCTION update_tab() RETURNS void AS $$ > BEGIN > INSERT INTO tab VALUES (0); > FOR i IN 1..100000 LOOP > UPDATE tab SET x = x + 1; > END LOOP; > END > $$ LANGUAGE plpgsql; I believe that none of the dead row versions can be vacuumed during this test. So yes, it sucks, but is it representative of real-world cases? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 4 Jul 2010 00:17 On Sun, Jul 4, 2010 at 12:11 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas(a)gmail.com> writes: >> CREATE OR REPLACE FUNCTION update_tab() RETURNS void AS $$ >> BEGIN >> � � � INSERT INTO tab VALUES (0); >> � � � FOR i IN 1..100000 LOOP >> � � � � � � � UPDATE tab SET x = x + 1; >> � � � END LOOP; >> END >> $$ LANGUAGE plpgsql; > > I believe that none of the dead row versions can be vacuumed during this > test. Yep, you seem to be right. The table grows to 802 pages. But why is it that we can't vacuum them as we go along? > So yes, it sucks, but is it representative of real-world cases? Hard to say, but I think it probably is to some degree. I stumbled on it more-or-less by accident, but it wouldn't surprise me to find out that there are people doing such things in real applications. It's not uncommon to want to store an updateable counter somewhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Rainer Pruy on 4 Jul 2010 03:40 Am 04.07.2010 06:11, wrote Tom Lane: > ... but is it representative of real-world cases? > > regards, tom lane > Hi Tom, we do run an application in productive use that suffered from a similar effect. We did not have 1000000 updates per row, but 10-100 updates per row on about 1-10 million rows of a table. In the end we managed to increase performance by factor of more than two by adding support to the application to track updates internally and only "flush" changes to the database at the (final) application commit. This did cost a lot as now we needed to adjust queries on the table with data stored internally (as not yet reflected in the database). This still is more efficient as updating and performing operation an the database directly. (e.g. an update using the primary key of the table (about 50 million rows total) would have lasted over 3 seconds(!) while initially the very same update was done within far below 1ms). So I think this could qualify as a real world example of that case. Regards, Rainer -- Rainer Pruy Managing Director Acrys Consult GmbH & Co. KG Theodor-Heuss-Str. 53-63, 61118 Bad Vilbel, Germany Phone: +49-6101-98760-0 Fax: +49-6101-98760-50 Web: http://www.acrys.com - Email: office(a)acrys.com Registered: Frankfurt am Main, HRA 31151 General partner: Acrys Verwaltungs GmbH Theodor-Heuss-Str. 53-63, D-61118 Bad Vilbel Registered: Frankfurt am Main, HRB 57625 -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 4 Jul 2010 09:48 Robert Haas <robertmhaas(a)gmail.com> writes: > On Sun, Jul 4, 2010 at 12:11 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: >> I believe that none of the dead row versions can be vacuumed during this >> test. > Yep, you seem to be right. The table grows to 802 pages. But why is > it that we can't vacuum them as we go along? Sure. What you'd need is for HeapTupleSatisfiesVacuum to observe that (a) the tuple's xmin and xmax are equal, (b) they're equal to my own transaction's XID, (c) none of the live snapshots in my backend can see cmin but not cmax, (d) cmax < currentCommandId, ensuring that every future snapshot will see cmax too (not quite convinced this is certain to hold). Now that we have a centralized list of all live snapshots, it's at least possible in principle to do (c). (I'm ignoring the possibility that the xmin and xmax are from different subtransactions of my own XID --- that seems to complicate matters greatly in order to handle even-more-cornerish cases.) Of course, you'd also need to get to HeapTupleSatisfiesVacuum in the first place. The complained-of case lacks any VACUUM call. Maybe a HOT cleanup would happen at the right time but I'm not sure. If it doesn't, adding one would represent a significant expenditure that would usually not be repaid. Another issue here is that since xmin is certainly within the GlobalXmin horizon, it would be essential to preserve the update chain ctid links, ie, make the tuple's update predecessor point to its successor. That seems workable for the case of cleaning out an intermediate entry in a HOT chain, but not otherwise. Details left as an exercise for the student. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Jesper Krogh on 5 Jul 2010 05:56 On 2010-07-04 06:11, Tom Lane wrote: > Robert Haas<robertmhaas(a)gmail.com> writes: > >> CREATE OR REPLACE FUNCTION update_tab() RETURNS void AS $$ >> BEGIN >> INSERT INTO tab VALUES (0); >> FOR i IN 1..100000 LOOP >> UPDATE tab SET x = x + 1; >> END LOOP; >> END >> $$ LANGUAGE plpgsql; >> > I believe that none of the dead row versions can be vacuumed during this > test. So yes, it sucks, but is it representative of real-world cases? > > The problem can generally be written as "tuples seeing multiple updates in the same transaction"? I think that every time PostgreSQL is used with an ORM, there is a certain amount of multiple updates taking place. I have actually been reworking clientside to get around multiple updates, since they popped up in one of my profiling runs. Allthough the time I optimized away ended being both "roundtrip time" + "update time", but having the database do half of it transparently, might have been sufficient to get me to have had a bigger problem elsewhere.. To sum up. Yes I think indeed it is a real-world case. Jesper -- Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
|
Next
|
Last
Pages: 1 2 Prev: [HACKERS] test_fsync output improvement Next: [HACKERS] proof concept: do statement parametrization |