From: raja on 21 Nov 2008 09:28 Hi, I will try to get a AWR Report which has an interval of 10-15 minutes interval. Currently i have only this report and i am trying to analyze it. In AWR Report, SQL Statistics, there are many parts like : SQL ordered by Elapsed Time SQL ordered by CPU Time SQL ordered by Gets SQL ordered by Reads SQL ordered by Executions SQL ordered by Parse Calls SQL ordered by Sharable Memory SQL ordered by Version Count Which i should look into first ? why ? With Regards, Raja.
From: Charles Hooper on 21 Nov 2008 10:52 On Nov 21, 9:28 am, raja <dextersu...(a)gmail.com> wrote: > Hi, > > I will try to get a AWR Report which has an interval of 10-15 minutes > interval. > Currently i have only this report and i am trying to analyze it. > > In AWR Report, SQL Statistics, there are many parts like : > SQL ordered by Elapsed Time > SQL ordered by CPU Time > SQL ordered by Gets > SQL ordered by Reads > SQL ordered by Executions > SQL ordered by Parse Calls > SQL ordered by Sharable Memory > SQL ordered by Version Count > Which i should look into first ? why ? > > With Regards, > Raja. On first examination, I see: Buffer Hit %: 99.89 One might say, that is a near perfect buffer cache hit ratio, move on to the next report. But I have an example which shows a 100% buffer cache hit ratio and the performance of the database instance was terrible, so maybe we should permanently ignore this statistic. Watch for line wrapping (word wrapping) in the following: Just a couple statistics to frame the rest of the report: Elapsed: 359.46 (mins) (21,567.6 seconds) DB Time: 1,723.72 (mins) {DB Time includes CPU time and wait events for all sessions} One approach to looking at the AWR/Statspack reports might be to start with the CPU usage and the wait events, and see if any further investigation is needed. NUM_CPUS 14 NUM_CPU_CORES 7 Total CPU Time Possible: 14 * 359.46 * 60 seconds = 301,946.4 seconds - note that it is unlikely that processes will jump from one CPU to another CPU, although if parallel processing is in use, more than one CPU might be used to retrieve the data for SQL statements. It is quite possible that one CPU might become nearly 100% busy, yet the average CPU busy calculation might might indicate that the CPUs are only 7% busy. CPU time: 56,931 seconds Percentage of total possible CPU Time that was consumed by Oracle: 56,931/301,946 = 18.9% The above is not conclusive, digging deeper... What is happening at the system level: OS Statistics Statistic Total AVG_BUSY_TIME 773,431/100= 7,734.31s (total divided by # CPUs) AVG_IDLE_TIME 1,382,735/100= 13,827.35s AVG_IOWAIT_TIME 198,722/100= 1,987.22s AVG_SYS_TIME 112,958/100= 1,129.58s AVG_USER_TIME 659,771/100= 6,597.71s OS_CPU_WAIT_TIME 11,375,200/100=113,752.00s OS_CPU_WAIT_TIME is defined as: Total number of seconds that processes have been in a ready state, waiting to be selected by the operating system scheduler to run. This seems to indicate that there was significant waiting for CPU time slices (competition for the same CPU), even though the average CPU utilization at the operating system level is 35.9% [7,734.31/(7,734.31+13,827.35)]. Several processes are apparently fighting for time on the same CPU. You could look at the Time Model Statistics, but they may not be telling much other than there may be a lot of context switches, and more than half of the total SQL execution time was spent on the CPUs (rather than waiting for disk or other wait events): Statistic Name Time (s) % of DB Time sql execute elapsed time 102,885.27 99.48 DB CPU 56,931.50 55.05 Java execution elapsed time 18,173.85 17.57 PL/SQL execution elapsed time 1,146.88 1.11 .... DB time 103,423.35 background elapsed time 9,817.43 background cpu time 1,169.73 The hierarchy of the above Time Model Statistics should be read like this: 1) background elapsed time 9,817.43 2) background cpu time 1,169.73 (of the 9,817.43) 1) DB time 103,423.35 2) DB CPU 56,931.50 (of the 103,423.35) 2) connection management call elapsed time 2) sequence load elapsed time 2) sql execute elapsed time 2) parse time elapsed 3) hard parse elapsed time 4) hard parse (sharing criteria) elapsed time 5) hard parse (bind mismatch) elapsed time 3) failed parse elapsed time 4) failed parse (out of shared memory) elapsed time 2) PL/SQL execution elapsed time 1,146.88 2) inbound PL/SQL rpc elapsed time 2) PL/SQL compilation elapsed time 2) Java execution elapsed time 18,173.85 (of the 103,423.35) Let's switch our attention to the wait events, but first a couple statistics to provide a frame of reference: SQL*Net roundtrips to/from client 4,813,090 SQL*Net roundtrips to/from dblink 43,646 user calls 4,836,293 user commits 24,259 user rollbacks 20 Event Waits %Time -outs Total Wait Time (s) Avg wait (ms) Waits /txn db file sequential read 4,036,055 0.00 13,310 3 166.24 db file parallel write 567,009 0.00 6,515 11 23.35 db file scattered read 934,341 0.00 1,383 1 38.48 log file parallel write 38,621 0.00 1,019 26 1.59 Backup: sbtwrite2 51,438 0.00 896 17 2.12 Log archive I/O 15,175 0.00 658 43 0.63 log file sync 17,706 0.10 387 22 0.73 direct path write temp 98,515 0.00 313 3 4.06 SQL*Net more data to client 4,642,420 0.00 154 0 191.21 enq: TX - row lock contention 1,475 3.19 138 94 0.06 control file parallel write 10,484 0.00 107 10 0.43 log file sequential read 12,839 0.00 87 7 0.53 read by other session 40,059 0.00 79 2 1.65 direct path read 272,647 0.00 54 0 11.23 log buffer space 181 0.00 46 256 0.01 enq: WL - contention 16 87.50 44 2742 0.00 latch: cache buffers chains 363,571 0.00 37 0 14.97 control file single write 448 0.00 28 63 0.02 os thread startup 424 0.00 23 55 0.02 RMAN backup & recovery I/O 6,130 0.00 18 3 0.25 control file sequential read 955,614 0.00 9 0 39.36 log file switch completion 75 0.00 7 89 0.00 SQL*Net more data from dblink 127,162 0.00 6 0 5.24 cursor: pin S wait on X 365 100.00 4 10 0.02 SQL*Net message to client 4,819,165 0.00 4 0 198.49 .... According to the number of waits for the SQL*Net events, on average, just about every request for data from the server requires two sends as the data returned is too large to fit into the SDU (defaults to 2,000 bytes) - this is based on comparing the value of "SQL*Net message to client" to the value of "SQL*Net more data to client". This indicates that network latency, if significant, could have a significant impact on apparent database performance. There are timeouts reported for some of the wait events - this may be a critical indicator. If the timeout for a paricular wait event happens to be 3 seconds, and 88% of the waits for an event happen to result in time outs, that means on average 88% of the time the event happens, it will last 3 seconds in duration. However, the wait count increases by 1 every time a time-out occurs, thus it might be a single session waiting non-stop for 42 seconds (16 * 87.50 * 3 seconds) = 42 seconds. "db file sequential read" indicates that Oracle is reading one block from disk - this usually indicates that Oracle is reading from an index, but might also happen rarely during a tablescan (if the other blocks needed are already in the buffer cache), or when reading the undo blocks to provide a consistent read. "db file scattered read" indicates that Oracle is reading more than one block at a time from disk - this usually indicates that Oracle is reading from a table during a full table scan. "db file parallel write" indicates waits related to the DBWR process, and may be related to the number of DBWR processes or DBWR I/O slaves. May also indicate slow or high contention devices. I believe that I previously commented on the value of the parameter db_writer_processes, and suggested that you take a look at an article on Kevin Closson's website related to what happens when that parameter is set to a high value - that article talks about the impact on CPU queuing, which seems to be a problem in this AWR report. "log file parallel write" indicates Writing redo records to the redo log files from the log buffer. This usually indicates slow devices or contention where the online redo logs are located. Take a look at the definitions for the other wait events. Use the CPU utilization and wait events to determine what other sections of the AWR report need to be examined. Post the results of your findings. Charles Hooper IT Manager/Oracle DBA K&M Machine-Fabricating, Inc.
From: joel garry on 21 Nov 2008 17:13 On Nov 21, 6:28 am, raja <dextersu...(a)gmail.com> wrote: > Hi, > > I will try to get a AWR Report which has an interval of 10-15 minutes > interval. > Currently i have only this report and i am trying to analyze it. > > In AWR Report, SQL Statistics, there are many parts like : > SQL ordered by Elapsed Time > SQL ordered by CPU Time > SQL ordered by Gets > SQL ordered by Reads > SQL ordered by Executions > SQL ordered by Parse Calls > SQL ordered by Sharable Memory > SQL ordered by Version Count > Which i should look into first ? why ? > > With Regards, > Raja. The why relates to why you are looking at this report in the first place. You should be familiar with the system and how it is running, and have some idea of what is wrong. It's easy to fall into the trap of tuning for tuning's sake, also wittily referred to as OTD - Obsessive Tuning Disorder. It's also easy to fall into the trap of saying "no one is complaining, so therefore nothing is wrong." Personally, I think there is value in checking a system to be sure nothing is about to become a problem, or become a problem when two not- problems run concurrently. Also, when a bottleneck is created, things can go downhill in a hurry, one wants to be aware of what to watch for. This falls under the part of a DBA's job called "knowing your system." Tuning is an iterative process, too. You need to order the importance of what is wrong, and from there decide which to look for first. Once you've fixed it, something else may show up. It is even possible the fix may make things worse. For example, you may discover lots of context switching due to unnecessary PL calls by a major app, and decide to fix it by converting a particular bit of code to SQL. Then the processor is free to do everything faster, overwhelming the I/O system with calls. So, it depends. That's why you follow a particular methodology, preferably based on rational inquiry like Charles is demonstrating, rather than just twiddling with ratios or expecting some cookbook scheme to help. The trick is to tease out the clues that send you in the right direction, and ignore the red herrings. So what is the database waiting for? Are the waits what you would expect given what your application is doing? Charles' SDU comment is certainly worth investigating further. You might want to briefly look at top sql by parse calls if you are using a packaged app, some vendors write poor code (ie, without bind variables) and it becomes painfully obvious when your most-used code suffers from excessive parsing. Also compare the top executed with all the others, at least to be sure your most executed code is working as it should. Sometimes some blatant easily fixable errors pop out there. But then again, the same applies to all those top sql lists. jg -- @home.com is bogus. http://home.nestor.minsk.by/jazz/news/2008/11/1804.html
From: raja on 25 Nov 2008 03:46 Ok. I will check with the definitions of wait events and get back to you. Thanks. With Regards, Raja.
From: raja on 25 Nov 2008 08:03
Hi, I have also another information which would help to solve the problem along with the AWR Report. From the advisor tables, i got the following information ( these were the findings made by oracle ) : Application Analysis : ----------------------- Wait event "Backup: sbtwrite2" in wait class "Administrative" Wait event "Data file init write" in wait class "User I/O" Wait event "enq: CF - contention" in wait class "Other" Wait event "enq: JI - contention" in wait class "Other" Wait event "enq: TC - contention" in wait class "Other" Wait event "inactive session" in wait class "Other" Wait event "SQL*Net more data from dblink" in wait class "Network" Wait event "wait for a undo record" in wait class "Other" Waits on event "log file sync" while performing COMMIT and ROLLBACK operations I will check over the above wait events also and get back to you. Waits on event "log file sync" while performing COMMIT and ROLLBACK operations - i think we should not use more commits in the application, to solve this problem. I hope most of the above mentioned wait events were due to the Backup activity that has been taken place ( This was also present in the AWR Report). I have consolidated the above wait events ( data was taken for 2 months ) So the problem with the database should be mostly with the backup activity only. Correct ? With Regards, Raja. |