From: EdStevens on 9 Sep 2005 12:26 Platform: 8.1.7 and 9.2.0.1 on various Win2k or Win2003 Last week a developer brought me a query that was running satisfactorily in prod (8.1.7), but took 45 minutes in test (9.2.0). As it happens, we are in the middle of transitioning a group of databases from 8i to 9i, and so have several copies of this database running under both versions, on a variety of hardware, ranging from honking big 8-cpu blade servers to rusty old 2-cpu doorstops with mice nesting on the motherboard. Pulling an extended trace showed that the test box was spending 99+ percent of its time on 'hs message to agent' events. (The query includes a join to a DB2 table, via the Oracle Transparent Gateway.) At first I suspected 8i vs. 9i gateway issues, but as we have tested on various platforms, we have eliminated that. As we tried to narrow down the variables, we discovered that the poor execution (high HS message events) only occured on one 9.2 database, running on a server that was in the middle of the range of available hardware. We found that on every other system, the optimizer was generating an access plan that featured several hash joins, where the one poor performing system generates nested loops. Even when we export/import the entire schema from the 'bad' 9.2 box to a 'good' 9.2 box. Initialization parms idendtical, data structure and volume identical. And yet access plans are different, with disasterous results. So ... at this point I'm grasping for what other factors could influence the CBO. Where do I look for a reasonably detailed discussion of such? The qeury is rather lengthy, and explain plans never format well here. And I wouldn't expect anyone to pore over them. Just looking for some pointers on the kinds of things I should be looking at. At this point I'm out of ideas.
From: Jonathan Lewis on 9 Sep 2005 12:56 "EdStevens" <quetico_man(a)yahoo.com> wrote in message news:1126283179.991124.138480(a)z14g2000cwz.googlegroups.com... > Platform: 8.1.7 and 9.2.0.1 on various Win2k or Win2003 > > Last week a developer brought me a query that was running > satisfactorily in prod (8.1.7), but took 45 minutes in test (9.2.0). > > As it happens, we are in the middle of transitioning a group of > databases from 8i to 9i, and so have several copies of this database > running under both versions, on a variety of hardware, ranging from > honking big 8-cpu blade servers to rusty old 2-cpu doorstops with mice > nesting on the motherboard. > > Pulling an extended trace showed that the test box was spending 99+ > percent of its time on 'hs message to agent' events. (The query > includes a join to a DB2 table, via the Oracle Transparent Gateway.) > At first I suspected 8i vs. 9i gateway issues, but as we have tested on > various platforms, we have eliminated that. > > As we tried to narrow down the variables, we discovered that the poor > execution (high HS message events) only occured on one 9.2 database, > running on a server that was in the middle of the range of available > hardware. We found that on every other system, the optimizer was > generating an access plan that featured several hash joins, where the > one poor performing system generates nested loops. Even when we > export/import the entire schema from the 'bad' 9.2 box to a 'good' 9.2 > box. Initialization parms idendtical, data structure and volume > identical. And yet access plans are different, with disasterous > results. > > So ... at this point I'm grasping for what other factors could > influence the CBO. Where do I look for a reasonably detailed > discussion of such? The qeury is rather lengthy, and explain plans > never format well here. And I wouldn't expect anyone to pore over > them. Just looking for some pointers on the kinds of things I should > be looking at. At this point I'm out of ideas. > Just giving you a possibility to shoot at: You say the data structure etc.are identical and have eliminated the possibility of a data restructure causing changes in data packing, cluster factors etc. Have you checked to see if the bad 9i database has enabled system statistics (CPU costing). It is almost a guaranteed side effect of CPU costing that the optimizer gets biased (though not necessarily strongly) towards indexed access paths and away from table-scan access paths (which tends to mean more nested loops and fewer hash joins and sort/merge joins) -- Regards Jonathan Lewis http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html Cost Based Oracle - Volume 1: Fundamentals On-shelf date: Nov 2005 http://www.jlcomp.demon.co.uk/faq/ind_faq.html The Co-operative Oracle Users' FAQ http://www.jlcomp.demon.co.uk/appearances.html Public Appearances - schedule updated 4th Sept 2005
From: EdStevens on 9 Sep 2005 15:32 Bingo! My partner had assured me that he had checked the initialization parms and they were all identical. When I looked at the 10053 trace where all the optimizer related parms were grouped together I discovered that OPTIMIZER_INDEX_CACHING, OPTIMIZER_INDEX_COST_ADJ, and OPTIMIZER_MAX_PERMUTATIONS were different.
From: EdStevens on 9 Sep 2005 15:34 Turned out to be simpler (and much more a DSA on my part .. see my reply to Doug) but you've given me some study points. Thanks for the reply.
From: Jonathan Lewis on 9 Sep 2005 15:44 "EdStevens" <quetico_man(a)yahoo.com> wrote in message news:1126294364.813025.287860(a)o13g2000cwo.googlegroups.com... > Bingo! My partner had assured me that he had checked the > initialization parms and they were all identical. When I looked at the > 10053 trace where all the optimizer related parms were grouped together > I discovered that OPTIMIZER_INDEX_CACHING, OPTIMIZER_INDEX_COST_ADJ, > and OPTIMIZER_MAX_PERMUTATIONS were different. > There's a nice little feature in 10g that helps with that, the parameter list is split into the parameters with altered values and parameters with default values. Such a small change, but such a big help. -- Regards Jonathan Lewis http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html Cost Based Oracle - Volume 1: Fundamentals On-shelf date: Nov 2005 http://www.jlcomp.demon.co.uk/faq/ind_faq.html The Co-operative Oracle Users' FAQ http://www.jlcomp.demon.co.uk/appearances.html Public Appearances - schedule updated 4th Sept 2005
|
Next
|
Last
Pages: 1 2 3 4 Prev: TIMEOUT ON CONTROL FILE ENQUEUE Next: ORA-12518 Listener could not hand off connection |