Prev: understanding oracle terminology - instance, database, sid,schema
Next: Global Industries - Central Infrastructure & Industry News
From: Steve Howard on 24 Jun 2010 22:20 Hi All, 10.2.0.4 three node cluster EE on SLES 10 Can someone give me a good definition of *exactly* what this means and what causes it (mainly the latter). The documentation is not very descriptive, with "Number of undo records applied to transaction tables that have been rolled back for consistent read purposes". It sounds like undo on undo, but we don't have any larger number for rollbacks (or commits) when this happens than we do at any other time. We have been plagued by this for over a year, and after multilpe SR's where the support analyst just reads us the documentation, I am at my wits end. We have a fairly large table (almost 1TB with about 300 million rows) with a large XMLTYPE column. Once a day, a job scans this table for records added that day for propagation to an external system. The longer the query runs, the more we see the session doing single block reads against the undo tablespace, with the stat in the subject climbing into the millions. Eventually, after several hours, an ORA-01555 is thrown. I even grabbed one of the P1/P2 parameters for the session querying and dumped the undo block in the P2 value. While it was a second or two after the event was posted, the block itself didn't even contain any references to the table being queried! Can anyone shed some light? Thanks, Steve
From: Steve Howard on 24 Jun 2010 22:25 On Jun 24, 10:20 pm, Steve Howard <stevedhow...(a)gmail.com> wrote: > Hi All, > > 10.2.0.4 three node cluster EE on SLES 10 ....I should also mention the explain plan for the query is OK. It is a range scan for the time period being retrieved, which may be two or three hours, representing 300,000 rows (out of 300 million). Also, when I look at "table fetch by rowid" for the querying session, it will periodically just stall. It is as if the session is reading thousands upon thousands of undo blocks for that next row. The arraysize is 15 (standard SQL*Plus) Sometimes this happens, sometimes it doesn't. We did apply patch 7527908
From: joel garry on 25 Jun 2010 12:01 On Jun 24, 7:25 pm, Steve Howard <stevedhow...(a)gmail.com> wrote: > On Jun 24, 10:20 pm, Steve Howard <stevedhow...(a)gmail.com> wrote: > > > Hi All, > > > 10.2.0.4 three node cluster EE on SLES 10 > > ...I should also mention the explain plan for the query is OK. It is > a range scan for the time period being retrieved, which may be two or > three hours, representing 300,000 rows (out of 300 million). > > Also, when I look at "table fetch by rowid" for the querying session, > it will periodically just stall. It is as if the session is reading > thousands upon thousands of undo blocks for that next row. The > arraysize is 15 (standard SQL*Plus) > > Sometimes this happens, sometimes it doesn't. > > We did apply patch 7527908 I have no idea, but I speculate you can use Tanel Poders' latchprofx and poke around on his site about in memory undo to figure this out. Does your plan show lots of recursion? jg -- @home.com is bogus. http://www.businessweek.com/news/2010-06-25/oracle-rises-after-sun-acquisition-fuels-profit-gain.html
From: Jonathan Lewis on 25 Jun 2010 12:37 This happens when your query hits a block that was updated "a long time" in the past by a transaction that has committed but not been cleaned out. Your query can tell that it is a committed transaction because the ITL entry for the transaction points to transaction table slot (in an undo segment header block) that has been re-used for a newer transaction. (Part of the transaction id is the "transaction sequence number", which is counting the number of times a transaction slot has been used). Your query therefore needs to know WHEN the transaction committed, so that it can decide whether or not it's supposed to see the new version or the old version of the row. (If the transaction committed before the query then the query doesn't need to know exactly when the transaction committed, if it started after the query then it has to be rolled back - and it's possible that the "snapshot too old" is the result of the data rollback than the transaction table rollback.) To find out when the transaction committed, your query copies the undo segment header block and starts rolling it back. The number of times this happens is recorded as: "transaction tables consistent read rollbacks" To perform the rollback, your query will read the transaction control block (another part of the undo segment header) which contains a number of important details - including the first undo block address of the most recent transaction to use that undo segment header. This undo block address will hold the first record of that transaction *** - which include information about the PREVIOUS state of the transaction control block. By using this undo record your query can take the undo segment header block backwards in time by one step - at which point it reads the older version of the transaction control block and repeats the process until it reaches the point where the transaction slot it's interested in has been taken back to the correct sequence number (or a change has taken the undo segment header block back to a point in time before the start of the query). Each record it reads in this process is counted in the "transaction tables consistent reads - undo records applied" (*** This is why the block you dumped had nothing to do with your table.) The trouble with your requirement is that we really need to do a backwards tablescan - because it's probably the data near the end of the table that is changing while you are "wasting" time reading all the data from the start of the table. Unfortunately there is no such hint - but if it's really critical, you could write some code to scan the table one extent at a time in reverse order. -- Regards Jonathan Lewis http://jonathanlewis.wordpress.com "Steve Howard" <stevedhoward(a)gmail.com> wrote in message news:82fa462e-574c-461d-b1c6-65a5473a3afc(a)d37g2000yqm.googlegroups.com... > Hi All, > > 10.2.0.4 three node cluster EE on SLES 10 > > Can someone give me a good definition of *exactly* what this means and > what causes it (mainly the latter). The documentation is not very > descriptive, with "Number of undo records applied to transaction > tables that have been rolled back for consistent read purposes". > > It sounds like undo on undo, but we don't have any larger number for > rollbacks (or commits) when this happens than we do at any other time. > > We have been plagued by this for over a year, and after multilpe SR's > where the support analyst just reads us the documentation, I am at my > wits end. > > We have a fairly large table (almost 1TB with about 300 million rows) > with a large XMLTYPE column. Once a day, a job scans this table for > records added that day for propagation to an external system. The > longer the query runs, the more we see the session doing single block > reads against the undo tablespace, with the stat in the subject > climbing into the millions. Eventually, after several hours, an > ORA-01555 is thrown. > > I even grabbed one of the P1/P2 parameters for the session querying > and dumped the undo block in the P2 value. While it was a second or > two after the event was posted, the block itself didn't even contain > any references to the table being queried! > > Can anyone shed some light? > > Thanks, > > Steve
From: joel garry on 25 Jun 2010 16:31
On Jun 25, 9:37 am, "Jonathan Lewis" <jonat...(a)jlcomp.demon.co.uk> wrote: > This happens when your query hits a block that was updated > "a long time" in the past by a transaction that has committed > but not been cleaned out. > > Your query can tell that it is a committed transaction because > the ITL entry for the transaction points to transaction table slot > (in an undo segment header block) that has been re-used for > a newer transaction. (Part of the transaction id is the "transaction > sequence number", which is counting the number of times a transaction > slot has been used). > > Your query therefore needs to know WHEN the transaction committed, > so that it can decide whether or not it's supposed to see the new version > or the old version of the row. (If the transaction committed before the > query then the query doesn't need to know exactly when the transaction > committed, if it started after the query then it has to be rolled back - > and it's possible that the "snapshot too old" is the result of the data > rollback > than the transaction table rollback.) > > To find out when the transaction committed, your query copies the undo > segment header block and starts rolling it back. The number of times this > happens is recorded as: > "transaction tables consistent read rollbacks" > > To perform the rollback, your query will read the transaction control block > (another part of the undo segment header) which contains a number of > important > details - including the first undo block address of the most recent > transaction > to use that undo segment header. This undo block address will hold the > first > record of that transaction *** - which include information about the > PREVIOUS > state of the transaction control block. By using this undo record your > query > can take the undo segment header block backwards in time by one step - > at which point it reads the older version of the transaction control block > and > repeats the process until it reaches the point where the transaction slot > it's > interested in has been taken back to the correct sequence number (or a > change > has taken the undo segment header block back to a point in time before the > start of the query). Each record it reads in this process is counted in > the > "transaction tables consistent reads - undo records applied" > > (*** This is why the block you dumped had nothing to do with your table.) > > The trouble with your requirement is that we really need to do a backwards > tablescan - because it's probably the data near the end of the table that > is > changing while you are "wasting" time reading all the data from the start > of > the table. Excellent explanation, but I lost you here. He says plan says doing a range scan, for 1% of the table? (Maybe you hadn't seen subsequent post yet, where he mentions a fetch suddenly exhibiting the characteristics you describe.) > > Unfortunately there is no such hint - but if it's really critical, you > could write > some code to scan the table one extent at a time in reverse order. This cleaning makes perfect sense, but I'm wondering if there is some administrative tuning like adjusting undo size or retention or some fiddling with initrans? Sounds critical if it's interrupting data extraction. I'm wondering if the mysterious translation of xmltype from a column might be a problem here. Steve, how exactly are you inserting and accessing this column? jg -- @home.com is bogus. snake oil 2.0 http://www.gapingvoidgallery.com/product_info.php?products_id=1614 |