From: Tom Lane on 15 May 2010 20:16 Michael Renner <michael.renner(a)amd.co.at> writes: > I've written a simple tool to generate traffic on a database [1], which > did about 30 TX/inserts per second to a table. Upon inspecting the data > in the table, I noticed the expected grouping of tuples which came from > a single backend to matching pages [2]. The strange part was that the > pages weren't completely filled but the backends seemed to jump > arbitrarily from one page to the next [3]. For the table in question > this resulted in about 10% wasted space. Which table would that be? The trigger-driven updates to "auction", in particular, would certainly guarantee some amount of "wasted" space. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Alvaro Herrera on 17 May 2010 16:14 Excerpts from Michael Renner's message of sáb may 15 20:24:36 -0400 2010: > On 16.05.2010 02:16, Tom Lane wrote: > > Michael Renner<michael.renner(a)amd.co.at> writes: > >> I've written a simple tool to generate traffic on a database [1], which > >> did about 30 TX/inserts per second to a table. Upon inspecting the data > >> in the table, I noticed the expected grouping of tuples which came from > >> a single backend to matching pages [2]. The strange part was that the > >> pages weren't completely filled but the backends seemed to jump > >> arbitrarily from one page to the next [3]. For the table in question > >> this resulted in about 10% wasted space. > > > > Which table would that be? The trigger-driven updates to "auction", > > in particular, would certainly guarantee some amount of "wasted" space. > > Yeah, the auction table receives heavy updates and gets vacuumed regularly. > > The behavior I showed was for the "bid" table, which only gets inserts > (and triggers the updates for the auction table). I think this may be related to the smgr_targblock stuff; if the relcache entry gets invalidated at the wrong time for whatever reason, the "current page" could be abandoned in favor of extending the rel. This has changed since 8.4, but a quick perusal suggests that it should be less likely on 9.0 than 8.4 but maybe there's something weird going on. -- -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 30 May 2010 22:42 Alvaro Herrera <alvherre(a)alvh.no-ip.org> writes: > Excerpts from Michael Renner's message of sáb may 15 20:24:36 -0400 2010: >>> I've written a simple tool to generate traffic on a database [1], which >>> did about 30 TX/inserts per second to a table. Upon inspecting the data >>> in the table, I noticed the expected grouping of tuples which came from >>> a single backend to matching pages [2]. The strange part was that the >>> pages weren't completely filled but the backends seemed to jump >>> arbitrarily from one page to the next [3]. For the table in question >>> this resulted in about 10% wasted space. > I think this may be related to the smgr_targblock stuff; if the relcache > entry gets invalidated at the wrong time for whatever reason, the > "current page" could be abandoned in favor of extending the rel. This > has changed since 8.4, but a quick perusal suggests that it should be > less likely on 9.0 than 8.4 but maybe there's something weird going on. I found time to try this example finally. The behavior that I see in HEAD is even worse than Michael describes: there is room for 136 rows per block in the bid table, but most blocks have only a few rows. The distribution after letting the exerciser run for 500 bids or so is typically like this: #rows block# 136 0 6 1 5 2 4 3 3 4 5 5 3 6 1 7 4 8 4 9 136 10 6 11 7 12 9 13 9 14 7 15 9 16 7 17 8 18 5 19 136 20 2 21 4 22 4 23 3 24 5 25 3 26 4 27 3 28 2 29 1 30 Examining the insertion timestamps and bidder numbers (client process IDs), and correlating this with logged autovacuum activity, makes it pretty clear what is going on. See the logic in RelationGetBufferForTuple, and note that at no time do we have any FSM data for the bid table: 1. Initially, all backends will decide to insert into block 0. They do so until the block is full. 2. At that point, each active backend individually decides it needs to extend the relation. They each create a new block and start inserting into that one, each carefully not telling anyone else about the block so as to avoid block-level insertion contention. In the above diagram, blocks 1-9 are each created by a different backend and the rows inserted into it come (mostly?) from just one backend. Block 10's first few rows also come from the one backend that created it, but it doesn't manage to fill the block entirely before ... 3. After awhile, autovacuum notices all the insert activity and kicks off an autoanalyze on the bid table. When committed, this forces a relcache flush for each other backend's relcache entry for "bid". In particular, the smgr targblock gets reset. 4. Now, all the backends again decide to try to insert into the last available block. So everybody jams into the partly-filled block 10, until it gets filled. 5. Lather, rinse, repeat. Since there are exactly 10 active clients (by default) in this test program, the repeat distance is exactly 10 blocks. The obvious thing to do about this would be to not reset targblock on receipt of a relcache flush event, but we can *not* do that in the general case. The reason that that gets reset is so that it's not left pointing to a no-longer-existent block after a VACUUM truncation. Maybe we could develop a way to distinguish truncation events from others, but right now the sinval signaling mechanism can't do that. This looks like there might be sufficient grounds to do something, though. Attached exhibits: contents of relevant columns of the bid table and postmaster log entries for autovacuum actions during the run. regards, tom lane
From: Takahiro Itagaki on 30 May 2010 23:46 Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > 3. After awhile, autovacuum notices all the insert activity and kicks > off an autoanalyze on the bid table. When committed, this forces a > relcache flush for each other backend's relcache entry for "bid". > In particular, the smgr targblock gets reset. > > 4. Now, all the backends again decide to try to insert into the last > available block. So everybody jams into the partly-filled block 10, > until it gets filled. The autovacuum process runs only analyze, but does not run vacuum at 3 because the workload is insert-only. Partially filled pages are never tracked by freespace map. We could re-run an autovacuum if we saw the report from the autoanalyze that says the table is low-density, but the additional vacuum might be overhead in other cases. > The obvious thing to do about this would be to not reset targblock > on receipt of a relcache flush event Even if we don't reset targblock, can we solve the issue when clients connect and disconnect for each insert? New backends only check the end of the table, and extend it as same as this case. If we are worrying about the worst caase, we might need to track newly added pages with freespace map. Of course we can ignore the case because frequent connections and disconnections should be always avoided. Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 31 May 2010 00:03 Robert Haas <robertmhaas(a)gmail.com> writes: > On Sun, May 30, 2010 at 10:42 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: >> pretty clear what is going on. �See the logic in >> RelationGetBufferForTuple, and note that at no time do we have any FSM >> data for the bid table: > Is this because, in the absence of updates or deletes, we never vacuum it? Right. >> 4. Now, all the backends again decide to try to insert into the last >> available block. �So everybody jams into the partly-filled block 10, >> until it gets filled. > Would it be (a) feasible and (b) useful to inject some entropy into this step? Maybe, but at least in this case, the insert rate is not fast enough that contention for the block is worth worrying about. IMO this isn't the part of the cycle that needs fixed. I guess another path to a fix might be to allow the backends to record new pages in the FSM immediately at creation. That might result in more insert contention, but it'd avoid losing track of the free space permanently, which is what is happening here (unless something happens to cause a vacuum). One reason the current code doesn't do that is that the old in-memory FSM couldn't efficiently support retail insertion of single-page data, but the new FSM code hasn't got a problem with that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
|
Next
|
Last
Pages: 1 2 Prev: Keepalive for max_standby_delay Next: Unexpected page allocation behavior on insert-onlytables |