AWR Sample Report [Oracle]

Prev: Role , System privilege and Object privilege
Next: RMAN Error

From: Charles Hooper on 25 Nov 2008 08:58

On Nov 25, 8:03 am, raja <dextersu...(a)gmail.com> wrote:
> Hi,
>
> I have also another information which would help to solve the problem
> along with the AWR Report.
> From the advisor tables, i got the following information ( these were
> the findings made by oracle ) :
>
> Application Analysis :
> -----------------------
> Wait event "Backup: sbtwrite2" in wait class "Administrative"
> Wait event "Data file init write" in wait class "User I/O"
> Wait event "enq: CF - contention" in wait class "Other"
> Wait event "enq: JI - contention" in wait class "Other"
> Wait event "enq: TC - contention" in wait class "Other"
> Wait event "inactive session" in wait class "Other"
> Wait event "SQL*Net more data from dblink" in wait class "Network"
> Wait event "wait for a undo record" in wait class "Other"
> Waits on event "log file sync" while performing COMMIT and ROLLBACK
> operations
>
> I will check over the above wait events also and get back to you.
>
> Waits on event "log file sync" while performing COMMIT and ROLLBACK
> operations - i think we should not use more commits in the
> application, to solve this problem.
>
> I hope most of the above mentioned wait events were due to the Backup
> activity that has been taken place ( This was also present in the AWR
> Report).
> I have consolidated the above wait events ( data was taken for 2
> months )
> So the problem with the database should be mostly with the backup
> activity only.
> Correct ?
>
> With Regards,
> Raja.

While the Application Analysis information that you provided may be
helpful, it is probably best to focus on a specific time interval in
which you know that a performance problem is affecting *important
business critical* activities. Is the backup requiring an extra 15
minutes to complete an *important business critical* activity (it
might be), or is a repeated (user interactive) process which should
take 10 seconds to execute that now takes takes 2 minutes to execute a
greater *important business critical* activity. If you determine that
it is a specify user activity which is more important, look at an AWR
report for the time interval when the user activity is happening. If
there is a specific user involved in the performance problem, switch
to a 10046 trace at level 8 or 12 for just that user.

The excessive wait time for "log file sync" could possibly be
minimized/reduced by allocating a greater percentage of the *battery
back* RAID cache to write back caching, if that is an option in your
environment. When a commit is executed, the session must wait on this
event until Oracle receives confirmation from the operating system
that the write to disk completed. With write back caching, the write
confirmation is returned immediately, and the data in the cache is
later flushed to disk.

When investigating the wait events, do not just find the definitions
of those wait events. Dig deeper into the meaning of the wait events
by seeing if those wait events are discussed by Tom Kyte, Cary
Millsap, Jonathan Lewis, Richard Foote, Kevin Closson, etc.

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.

From: Steve Howard on 25 Nov 2008 09:18

On Nov 25, 8:03 am, raja <dextersu...(a)gmail.com> wrote:
> Hi,
>
> I have also another information which would help to solve the problem
> along with the AWR Report.
> From the advisor tables, i got the following information ( these were
> the findings made by oracle ) :
>
> Application Analysis :
> -----------------------
> Wait event "Backup: sbtwrite2" in wait class "Administrative"
> Wait event "Data file init write" in wait class "User I/O"
> Wait event "enq: CF - contention" in wait class "Other"
> Wait event "enq: JI - contention" in wait class "Other"
> Wait event "enq: TC - contention" in wait class "Other"
> Wait event "inactive session" in wait class "Other"
> Wait event "SQL*Net more data from dblink" in wait class "Network"
> Wait event "wait for a undo record" in wait class "Other"
> Waits on event "log file sync" while performing COMMIT and ROLLBACK
> operations
>
> I will check over the above wait events also and get back to you.
>
> Waits on event "log file sync" while performing COMMIT and ROLLBACK
> operations - i think we should not use more commits in the
> application, to solve this problem.
>
> I hope most of the above mentioned wait events were due to the Backup
> activity that has been taken place ( This was also present in the AWR
> Report).
> I have consolidated the above wait events ( data was taken for 2
> months )
> So the problem with the database should be mostly with the backup
> activity only.
> Correct ?
>
> With Regards,
> Raja.

Raja,

It sounds like you are interested in knowing how to read an AWR
report, without any real problem currently at hand. That's OK,
because that is the best time to learn (when you are not on fire).

I would suggest you do some time series analysis of your AWR to get a
feel for what types of "stuff" you see and when you see them. You can
then begin to model your application and know when something is in
fact, "wrong".

Try something like what is below to get you started. You can graph
the output in Excel to see what the trend is in your database, in this
case of single block reads. You can then apply this type of query to
other events in dba_hist_system_event, as well as other AWR views,
such as dba_hist_systat...

select to_char(begin_interval_time,'YYYY_MM_DD HH24:MI'),
dhse.instance_number,
time_waited_micro - lag(time_waited_micro, 1, 0) over (order by
dhse.instance_number, dhse.snap_id) as time_waited,
total_waits - lag(total_waits, 1, 0) over (order by
dhse.instance_number, dhse.snap_id) as total_waits
from dba_hist_snapshot dhs,
dba_hist_system_event dhse
where dhs.snap_id = dhse.snap_id
and dhs.instance_number = dhse.instance_number
and event_name = 'db file sequential read'
order by 1,2;

If you see db file sequential read is always at the top of the list
every reporting period, then an AWR that shows this was the top event
is probably not a real problem.

If you run an analysis against other items in AWR and one bubbles to
the top between 2AM and 4AM every Wednesday morning that you never see
otherwise, that may be something to investigate more in the AWR report
during the periods of the bubble. I find that to be absolutely
indispensable when trying to uncover "un-reported" performance
problems.

HTH,

Steve

From: joel garry on 25 Nov 2008 12:46

On Nov 25, 6:18 am, Steve Howard <stevedhow...(a)gmail.com> wrote:
> On Nov 25, 8:03 am, raja <dextersu...(a)gmail.com> wrote:
>
>
>
>
>
> > Hi,
>
> > I have also another information which would help to solve the problem
> > along with the AWR Report.
> > From the advisor tables, i got the following information ( these were
> > the findings made by oracle ) :
>
> > Application Analysis :
> > -----------------------
> > Wait event "Backup: sbtwrite2" in wait class "Administrative"
> > Wait event "Data file init write" in wait class "User I/O"
> > Wait event "enq: CF - contention" in wait class "Other"
> > Wait event "enq: JI - contention" in wait class "Other"
> > Wait event "enq: TC - contention" in wait class "Other"
> > Wait event "inactive session" in wait class "Other"
> > Wait event "SQL*Net more data from dblink" in wait class "Network"
> > Wait event "wait for a undo record" in wait class "Other"
> > Waits on event "log file sync" while performing COMMIT and ROLLBACK
> > operations
>
> > I will check over the above wait events also and get back to you.
>
> > Waits on event "log file sync" while performing COMMIT and ROLLBACK
> > operations - i think we should not use more commits in the
> > application, to solve this problem.
>
> > I hope most of the above mentioned wait events were due to the Backup
> > activity that has been taken place ( This was also present in the AWR
> > Report).
> > I have consolidated the above wait events ( data was taken for 2
> > months )
> > So the problem with the database should be mostly with the backup
> > activity only.
> > Correct ?
>
> > With Regards,
> > Raja.
>
> Raja,
>
> It sounds like you are interested in knowing how to read an AWR
> report, without any real problem currently at hand. That's OK,
> because that is the best time to learn (when you are not on fire).
>
> I would suggest you do some time series analysis of your AWR to get a
> feel for what types of "stuff" you see and when you see them. You can
> then begin to model your application and know when something is in
> fact, "wrong".
>
> Try something like what is below to get you started. You can graph
> the output in Excel to see what the trend is in your database, in this
> case of single block reads. You can then apply this type of query to
> other events in dba_hist_system_event, as well as other AWR views,
> such as dba_hist_systat...
>
> select to_char(begin_interval_time,'YYYY_MM_DD HH24:MI'),
> dhse.instance_number,
> time_waited_micro - lag(time_waited_micro, 1, 0) over (order by
> dhse.instance_number, dhse.snap_id) as time_waited,
> total_waits - lag(total_waits, 1, 0) over (order by
> dhse.instance_number, dhse.snap_id) as total_waits
> from dba_hist_snapshot dhs,
> dba_hist_system_event dhse
> where dhs.snap_id = dhse.snap_id
> and dhs.instance_number = dhse.instance_number
> and event_name = 'db file sequential read'
> order by 1,2;
>
> If you see db file sequential read is always at the top of the list
> every reporting period, then an AWR that shows this was the top event
> is probably not a real problem.
>
> If you run an analysis against other items in AWR and one bubbles to
> the top between 2AM and 4AM every Wednesday morning that you never see
> otherwise, that may be something to investigate more in the AWR report
> during the periods of the bubble. I find that to be absolutely
> indispensable when trying to uncover "un-reported" performance
> problems.
>
> HTH,
>
> Steve

While it has a great danger of chasing after wild geese, I've found EM
in the form of dbconsole does have some informative charts, in a gross
sense. I see a cute little blue i/o pyramid every half hour, a known
stupid update routine. When there is a solid bar of green on the cpu
chart, I know to look for a couple of certain programs - if it is any
other, definitely a warning of phone calls to come. The screen about
top memory consumption can be an eye opener. The [PS]GA advisors
turn that stuff into a few minutes of surfing (until I hit a Jonathan
Lewis post...). And after all, it's just another way to look at the
same tuning information.. Sometimes you can even look at the sql ADDM
is running.

Then as soon as I think "hey this is a cool tool," it does some stupid
thing that makes me glad I can figure out how to look under the
covers. But this one is new to me, thanks Steve (if I ever have time
to try it :-) .

jg
--
@home.com is bogus.
I wish someone would develop excelbane.
http://www.theregister.co.uk/2008/10/15/lehman_buyout_excel_confusion/

From: raja on 6 Dec 2008 04:47

Hi Steve,

I took the output of the query which you have given in the previous
post.
I am not able to post the output file here. So i am sending you and
charles.
The file contains the output that was taken from 25/11/2008 -
03/12/2008
In average, it looks like ordering the output by total waits, shows
"PX qref latch" as the priority.
I searched regarding this and found that it is related to parallel
processing.
( I also feel that they might have used parallel hint to make the
queries faster. Let me check out on these. )
Please check the file that i have sent to ur mail id and provide me ur
suggestions.

With Regards,
Raja.

From: Charles Hooper on 8 Dec 2008 09:56

On Dec 6, 4:47 am, raja <dextersu...(a)gmail.com> wrote:
> Hi Steve,
>
> I took the output of the query which you have given in the previous
> post.
> I am not able to post the output file here. So i am sending you and
> charles.
> The file contains the output that was taken from 25/11/2008 -
> 03/12/2008
> In average, it looks like ordering the output by total waits, shows
> "PX qref latch" as the priority.
> I searched regarding this and found that it is related to parallel
> processing.
> ( I also feel that they might have used parallel hint to make the
> queries faster. Let me check out on these. )
> Please check the file that i have sent to ur mail id and provide me ur
> suggestions.
>
> With Regards,
> Raja.

Raja,

I examined the spreadsheet - negative numbers should not appear when
subtracting the old values from the new values. You may need to make
an adjustment to the SQL statement if you plan to look at more than
one wait event at a time, or if there are more than one database
instances.
The SQL statement might need to be modified like this (I do not have
an AWR license, so this might not work as written):
select
to_char(begin_interval_time,'YYYY_MM_DD HH24:MI'),
dhse.instance_number,
time_waited_micro - nvl(lag(time_waited_micro, 1, 0) over (partition
by dhse.instance_number, dhse.event_name order by dhse.snap_id),0) as
time_waited,
total_waits - nvl(lag(total_waits, 1, 0) over (partition by
dhse.instance_number, dhse.event_name order by dhse.snap_id),0) as
total_waits
from
dba_hist_snapshot dhs,
dba_hist_system_event dhse
where
dhs.snap_id = dhse.snap_id
and dhs.instance_number = dhse.instance_number
order by
1,
2;

Just out of curiosity, I would like to see an updated version of the
spreadsheet using the new SQL statement.

In short, the "PX qref latch" wait event may not be the problem.

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: Role , System privilege and Object privilege
Next: RMAN Error