From: Charles Hooper on 25 Nov 2008 08:58 On Nov 25, 8:03 am, raja <dextersu...(a)gmail.com> wrote: > Hi, > > I have also another information which would help to solve the problem > along with the AWR Report. > From the advisor tables, i got the following information ( these were > the findings made by oracle ) : > > Application Analysis : > ----------------------- > Wait event "Backup: sbtwrite2" in wait class "Administrative" > Wait event "Data file init write" in wait class "User I/O" > Wait event "enq: CF - contention" in wait class "Other" > Wait event "enq: JI - contention" in wait class "Other" > Wait event "enq: TC - contention" in wait class "Other" > Wait event "inactive session" in wait class "Other" > Wait event "SQL*Net more data from dblink" in wait class "Network" > Wait event "wait for a undo record" in wait class "Other" > Waits on event "log file sync" while performing COMMIT and ROLLBACK > operations > > I will check over the above wait events also and get back to you. > > Waits on event "log file sync" while performing COMMIT and ROLLBACK > operations - i think we should not use more commits in the > application, to solve this problem. > > I hope most of the above mentioned wait events were due to the Backup > activity that has been taken place ( This was also present in the AWR > Report). > I have consolidated the above wait events ( data was taken for 2 > months ) > So the problem with the database should be mostly with the backup > activity only. > Correct ? > > With Regards, > Raja. While the Application Analysis information that you provided may be helpful, it is probably best to focus on a specific time interval in which you know that a performance problem is affecting *important business critical* activities. Is the backup requiring an extra 15 minutes to complete an *important business critical* activity (it might be), or is a repeated (user interactive) process which should take 10 seconds to execute that now takes takes 2 minutes to execute a greater *important business critical* activity. If you determine that it is a specify user activity which is more important, look at an AWR report for the time interval when the user activity is happening. If there is a specific user involved in the performance problem, switch to a 10046 trace at level 8 or 12 for just that user. The excessive wait time for "log file sync" could possibly be minimized/reduced by allocating a greater percentage of the *battery back* RAID cache to write back caching, if that is an option in your environment. When a commit is executed, the session must wait on this event until Oracle receives confirmation from the operating system that the write to disk completed. With write back caching, the write confirmation is returned immediately, and the data in the cache is later flushed to disk. When investigating the wait events, do not just find the definitions of those wait events. Dig deeper into the meaning of the wait events by seeing if those wait events are discussed by Tom Kyte, Cary Millsap, Jonathan Lewis, Richard Foote, Kevin Closson, etc. Charles Hooper IT Manager/Oracle DBA K&M Machine-Fabricating, Inc.
From: Steve Howard on 25 Nov 2008 09:18 On Nov 25, 8:03 am, raja <dextersu...(a)gmail.com> wrote: > Hi, > > I have also another information which would help to solve the problem > along with the AWR Report. > From the advisor tables, i got the following information ( these were > the findings made by oracle ) : > > Application Analysis : > ----------------------- > Wait event "Backup: sbtwrite2" in wait class "Administrative" > Wait event "Data file init write" in wait class "User I/O" > Wait event "enq: CF - contention" in wait class "Other" > Wait event "enq: JI - contention" in wait class "Other" > Wait event "enq: TC - contention" in wait class "Other" > Wait event "inactive session" in wait class "Other" > Wait event "SQL*Net more data from dblink" in wait class "Network" > Wait event "wait for a undo record" in wait class "Other" > Waits on event "log file sync" while performing COMMIT and ROLLBACK > operations > > I will check over the above wait events also and get back to you. > > Waits on event "log file sync" while performing COMMIT and ROLLBACK > operations - i think we should not use more commits in the > application, to solve this problem. > > I hope most of the above mentioned wait events were due to the Backup > activity that has been taken place ( This was also present in the AWR > Report). > I have consolidated the above wait events ( data was taken for 2 > months ) > So the problem with the database should be mostly with the backup > activity only. > Correct ? > > With Regards, > Raja. Raja, It sounds like you are interested in knowing how to read an AWR report, without any real problem currently at hand. That's OK, because that is the best time to learn (when you are not on fire). I would suggest you do some time series analysis of your AWR to get a feel for what types of "stuff" you see and when you see them. You can then begin to model your application and know when something is in fact, "wrong". Try something like what is below to get you started. You can graph the output in Excel to see what the trend is in your database, in this case of single block reads. You can then apply this type of query to other events in dba_hist_system_event, as well as other AWR views, such as dba_hist_systat... select to_char(begin_interval_time,'YYYY_MM_DD HH24:MI'), dhse.instance_number, time_waited_micro - lag(time_waited_micro, 1, 0) over (order by dhse.instance_number, dhse.snap_id) as time_waited, total_waits - lag(total_waits, 1, 0) over (order by dhse.instance_number, dhse.snap_id) as total_waits from dba_hist_snapshot dhs, dba_hist_system_event dhse where dhs.snap_id = dhse.snap_id and dhs.instance_number = dhse.instance_number and event_name = 'db file sequential read' order by 1,2; If you see db file sequential read is always at the top of the list every reporting period, then an AWR that shows this was the top event is probably not a real problem. If you run an analysis against other items in AWR and one bubbles to the top between 2AM and 4AM every Wednesday morning that you never see otherwise, that may be something to investigate more in the AWR report during the periods of the bubble. I find that to be absolutely indispensable when trying to uncover "un-reported" performance problems. HTH, Steve
From: joel garry on 25 Nov 2008 12:46 On Nov 25, 6:18 am, Steve Howard <stevedhow...(a)gmail.com> wrote: > On Nov 25, 8:03 am, raja <dextersu...(a)gmail.com> wrote: > > > > > > > Hi, > > > I have also another information which would help to solve the problem > > along with the AWR Report. > > From the advisor tables, i got the following information ( these were > > the findings made by oracle ) : > > > Application Analysis : > > ----------------------- > > Wait event "Backup: sbtwrite2" in wait class "Administrative" > > Wait event "Data file init write" in wait class "User I/O" > > Wait event "enq: CF - contention" in wait class "Other" > > Wait event "enq: JI - contention" in wait class "Other" > > Wait event "enq: TC - contention" in wait class "Other" > > Wait event "inactive session" in wait class "Other" > > Wait event "SQL*Net more data from dblink" in wait class "Network" > > Wait event "wait for a undo record" in wait class "Other" > > Waits on event "log file sync" while performing COMMIT and ROLLBACK > > operations > > > I will check over the above wait events also and get back to you. > > > Waits on event "log file sync" while performing COMMIT and ROLLBACK > > operations - i think we should not use more commits in the > > application, to solve this problem. > > > I hope most of the above mentioned wait events were due to the Backup > > activity that has been taken place ( This was also present in the AWR > > Report). > > I have consolidated the above wait events ( data was taken for 2 > > months ) > > So the problem with the database should be mostly with the backup > > activity only. > > Correct ? > > > With Regards, > > Raja. > > Raja, > > It sounds like you are interested in knowing how to read an AWR > report, without any real problem currently at hand. That's OK, > because that is the best time to learn (when you are not on fire). > > I would suggest you do some time series analysis of your AWR to get a > feel for what types of "stuff" you see and when you see them. You can > then begin to model your application and know when something is in > fact, "wrong". > > Try something like what is below to get you started. You can graph > the output in Excel to see what the trend is in your database, in this > case of single block reads. You can then apply this type of query to > other events in dba_hist_system_event, as well as other AWR views, > such as dba_hist_systat... > > select to_char(begin_interval_time,'YYYY_MM_DD HH24:MI'), > dhse.instance_number, > time_waited_micro - lag(time_waited_micro, 1, 0) over (order by > dhse.instance_number, dhse.snap_id) as time_waited, > total_waits - lag(total_waits, 1, 0) over (order by > dhse.instance_number, dhse.snap_id) as total_waits > from dba_hist_snapshot dhs, > dba_hist_system_event dhse > where dhs.snap_id = dhse.snap_id > and dhs.instance_number = dhse.instance_number > and event_name = 'db file sequential read' > order by 1,2; > > If you see db file sequential read is always at the top of the list > every reporting period, then an AWR that shows this was the top event > is probably not a real problem. > > If you run an analysis against other items in AWR and one bubbles to > the top between 2AM and 4AM every Wednesday morning that you never see > otherwise, that may be something to investigate more in the AWR report > during the periods of the bubble. I find that to be absolutely > indispensable when trying to uncover "un-reported" performance > problems. > > HTH, > > Steve While it has a great danger of chasing after wild geese, I've found EM in the form of dbconsole does have some informative charts, in a gross sense. I see a cute little blue i/o pyramid every half hour, a known stupid update routine. When there is a solid bar of green on the cpu chart, I know to look for a couple of certain programs - if it is any other, definitely a warning of phone calls to come. The screen about top memory consumption can be an eye opener. The [PS]GA advisors turn that stuff into a few minutes of surfing (until I hit a Jonathan Lewis post...). And after all, it's just another way to look at the same tuning information.. Sometimes you can even look at the sql ADDM is running. Then as soon as I think "hey this is a cool tool," it does some stupid thing that makes me glad I can figure out how to look under the covers. But this one is new to me, thanks Steve (if I ever have time to try it :-) . jg -- @home.com is bogus. I wish someone would develop excelbane. http://www.theregister.co.uk/2008/10/15/lehman_buyout_excel_confusion/
From: raja on 6 Dec 2008 04:47 Hi Steve, I took the output of the query which you have given in the previous post. I am not able to post the output file here. So i am sending you and charles. The file contains the output that was taken from 25/11/2008 - 03/12/2008 In average, it looks like ordering the output by total waits, shows "PX qref latch" as the priority. I searched regarding this and found that it is related to parallel processing. ( I also feel that they might have used parallel hint to make the queries faster. Let me check out on these. ) Please check the file that i have sent to ur mail id and provide me ur suggestions. With Regards, Raja.
From: Charles Hooper on 8 Dec 2008 09:56
On Dec 6, 4:47 am, raja <dextersu...(a)gmail.com> wrote: > Hi Steve, > > I took the output of the query which you have given in the previous > post. > I am not able to post the output file here. So i am sending you and > charles. > The file contains the output that was taken from 25/11/2008 - > 03/12/2008 > In average, it looks like ordering the output by total waits, shows > "PX qref latch" as the priority. > I searched regarding this and found that it is related to parallel > processing. > ( I also feel that they might have used parallel hint to make the > queries faster. Let me check out on these. ) > Please check the file that i have sent to ur mail id and provide me ur > suggestions. > > With Regards, > Raja. Raja, I examined the spreadsheet - negative numbers should not appear when subtracting the old values from the new values. You may need to make an adjustment to the SQL statement if you plan to look at more than one wait event at a time, or if there are more than one database instances. The SQL statement might need to be modified like this (I do not have an AWR license, so this might not work as written): select to_char(begin_interval_time,'YYYY_MM_DD HH24:MI'), dhse.instance_number, time_waited_micro - nvl(lag(time_waited_micro, 1, 0) over (partition by dhse.instance_number, dhse.event_name order by dhse.snap_id),0) as time_waited, total_waits - nvl(lag(total_waits, 1, 0) over (partition by dhse.instance_number, dhse.event_name order by dhse.snap_id),0) as total_waits from dba_hist_snapshot dhs, dba_hist_system_event dhse where dhs.snap_id = dhse.snap_id and dhs.instance_number = dhse.instance_number order by 1, 2; Just out of curiosity, I would like to see an updated version of the spreadsheet using the new SQL statement. In short, the "PX qref latch" wait event may not be the problem. Charles Hooper IT Manager/Oracle DBA K&M Machine-Fabricating, Inc. |