From: norman.dunbar on 29 Jun 2005 11:12 Morning all (or afternoon - I'm behind another firewall and using Google !) Oracle enterprise : 8.1.7.3 (there's nothing I can do about this I'm afraid !) HP-UX 11.11 8 CPUs and load average is 0.5 over 45 days. Sessions hang on library cache load lock, all are executing the same sql statement which makes a call to UTL_TCP.flush followed by utl_tcp.close_connectioon (see below). This is intermittants but very irritating for my users. I've Googled these groups and Metalinked till I'm blue in the face - nothing. :o( Gruesome details. User logs into an application - it sometimes hangs. User calls me up to advise me of the hang, in case I don't already know. I log into application - it hangs for me too. (I'm on a different network etc). I use SQLPlus and get an instant connection - my session, and that of my user(s) stay hung. Hence problem must be in the application. My script to interrogate V$SESSION_WAIT & V$SESSION shows the following : SQL> @sessionwait SID USERNAME OSUSER SQL_HASH_VALUE EVENT ------------ --------------- --------------- -------------- ----------------------- 29 WV20 www2 1359669396 library cache load lock 68 WV20 landrews 1359669396 library cache load lock 70 WV20 www2 1359669396 library cache load lock 79 WV20 www2 1359669396 library cache load lock 90 WV20 www2 1359669396 library cache load lock 114 WV20 www2 1359669396 library cache load lock 6 rows selected. www2 is a web interface (written in Java) to the application via Apache. The application also seems to login three sessions to the database for every actual execution of the application. Hmm. Everyone is waiting on the same sql_hash_value. This is a simple statement : SQL> @sqlbyhash_8i Enter value for hash_value: 1359669396 SQL_TEXT ---------------------------------------------------------------- begin -- if there are errors, there's nothing we can do :( b egin fdi_norm.Tidy; exception when others then null; end; end; Now, I've tracked the above code to one place, in a logoff trigger for a specific schema that we are not (directly) logging into - it is possible that the application is logging into it in the background though. The comments helped in tracing the code ! The code being executed, FDI_NORM.TIDY, can and is executed from elsewhere though. I've never caught it hanging when called from elsewhere, it's always from the above statement which is in the trigger. The actual code being executed from the trigger looks like this : procedure Tidy is begin utl_tcp.flush(c); utl_tcp.close_connection(c); Connected := false; SessionID := -1; exception when OTHERS then null; end Tidy; I have both FDI_NORM and UTL_TCP pinned - it makes no difference. (By the way, FDI_NORM has nothing to do with me, only the names are the same - this code is from a third party application.) Now I've tested this procedure in isolation when the system is running fine - it takes 2.5 seconds on the first run and zero seconds on all subsequent runs during this session. When the system is hanging, it takes around 4 minutes to run and times out. The connection is to another server here and that server is running happily as is the network - we've had it checked by out networking guys & gals. Metalink note 122793.1 - How To Find The Session Holding A Library Cache Lock - has two scripts to list holders and waiters which select from SYS.X$KGLLK. When the system is hung, these scripts return no rows selected for holders and waiters. SessionWait still shows waiters on L.C.L.L. DBA_LOCK_INTERNAL shows that each waiting session has the following : MODE_HELD = Share and MODE_REQUESTED = None for one session, plus two entries with MODE_HELD = Null and MODE_REQUESTED = None. All have the same SESSION_ID but LOCK_ID1 is different and LOCK_ID2 is null for each. The lock_type is a 'Cursor definition lock' for the two Null mode_held entries and 'Cursor Definition pin' for the 'share' entry. One of the waiters in V$SessionWait is a background task (no OSUSER in V$SESSION) and has the name 'SNP0', description 'Job Queue Process 0' in V$BGPROCESS. Shared pool is 250 Mb. There is an extract of the statspack report at the end of this posting, but I have a few worries : Response times (as analysed by oraperf.com) were 99.93% wait time and 0.07% cpu. L.C.L.L was responsible for 98.69% of the total wait time. 63,432,885 gets on the Library Cache latches was requested in 75 seconds. Interesting. This was 99.19% of the latch free waits. Of these, 1,723 slept. The latch sleeps were mostly on kglic (690), kglpnal: child:before pro (308), kglpnal: child: alloc spac (211) and kglhdgn:child (156). The rest dwindled from 78 sleeps down to 1. Session_cached_cursors is not set. Anyone got any ideas on tracking this down ? Please & thanks in advance. Oh, by the way, the application *does* use bind variables :o) Cheers, Norm. Here's the SP report stuff : A statspack report for 75 seconds, which is a 'just before' the lockup to 'just after', shows the following (extracts) : Snap Id Snap Time Sessions ------- ------------------ -------- Begin Snap: 16114 29-Jun-05 14:40:25 5,303 End Snap: 16116 29-Jun-05 14:41:40 5,303 Elapsed: 1.25 (mins) Load Profile ~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 11,115.84 21,376.62 Logical reads: 1,961.79 3,772.67 Block changes: 18.31 35.21 Physical reads: 4.20 8.08 Physical writes: 3.76 7.23 User calls: 6.59 12.67 Parses: 17.05 32.79 Hard parses: 1.63 3.13 Sorts: 64.03 123.13 Logons: 0.16 0.31 Executes: 258.83 497.74 Transactions: 0.52 % Blocks changed per Read: 0.93 Recursive Call %: 99.19 Rollback per transaction %: 2.56 Rows per Sort: 6.89 Instance Efficiency Percentages (Target 100%) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 100.00 Redo NoWait %: 100.00 Buffer Hit %: 99.79 In-memory Sort %: 99.98 Library Hit %: 96.80 Soft Parse %: 90.46 Execute to Parse %: 93.41 Latch Hit %: 99.99 Parse CPU to Parse Elapsd %: 80.41 % Non-Parse CPU: 100.00 Shared Pool Statistics Begin End ------ ------ Memory Usage %: 84.47 91.13 % SQL with executions>1: 91.79 89.32 % Memory for SQL w/exec>1: 89.60 94.41 (Hey, I must have a great performing database, my buffer hit ratio is 99.79% !!!) Top 5 Wait Events ~~~~~~~~~~~~~~~~~ Wait % Total Event Waits Time (cs) Wt Time -------------------------------------------- ------------ ------------ ------- library cache load lock 268 30,205 98.29 db file sequential read 92 175 .57 log file parallel write 75 129 .42 enqueue 1 81 .26 latch free 1,764 54 .18 Wait Events for DB: OLIBPR Instance: olibpr Snaps: 16114 -16116 -> cs - centisecond - 100th of a second -> ms - millisecond - 1000th of a second -> ordered by wait time desc, waits desc (idle events last) Avg Total Wait wait Waits Event Waits Timeouts Time (cs) (ms) /txn ---------------------------- ------------ ---------- ----------- ------ ------ library cache load lock 268 87 30,205 1127 6.9 db file sequential read 92 0 175 19 2.4 log file parallel write 75 0 129 17 1.9 enqueue 1 0 81 810 0.0 latch free 1,764 943 54 0 45.2 control file parallel write 24 0 44 18 0.6 library cache pin 18 0 16 9 0.5 log file sync 17 0 11 6 0.4 log buffer space 6 0 11 18 0.2 Instance Activity Stats for DB: OLIBPR Instance: olibpr Snaps: 16114 -16116 Statistic Total per Second per Trans --------------------------------- ---------------- ------------ ------------ CPU used by this session 754,871 10,065.0 19,355.7 CPU used when call started 2,017 26.9 51.7 parse count (hard) 122 1.6 3.1 parse count (total) 1,279 17.1 32.8 parse time cpu 119 1.6 3.1 parse time elapsed 148 2.0 3.8 Latch Activity for DB: OLIBPR Instance: olibpr Snaps: 16114 -16116 ->"Get Requests", "Pct Get Miss" and "Avg Slps/Miss" are statistics for willing-to-wait latch get requests ->"NoWait Requests", "Pct NoWait Miss" are for no-wait latch get requests ->"Pct Misses" for both should be very close to 0.0 Pct Avg Pct Get Get Slps NoWait NoWait Latch Name Requests Miss /Miss Requests Miss ----------------------------- -------------- ------ ------ ------------ ------ JOX SGA heap latch 869 0.1 1.0 0 Token Manager 2 0.0 5 0.0 active checkpoint queue latch 30 0.0 5 0.0 cache buffer handles 1,942 0.0 0 cache buffers chains 254,304 0.1 0.1 136 0.0 cache buffers lru chain 47 0.0 123 0.0 channel handle pool latch 17 0.0 0 channel operations parent lat 22 0.0 0 checkpoint queue latch 444 0.0 0 dml lock allocation 226 0.0 0 enqueue hash chains 551 0.0 0 enqueues 892 0.0 0 event group latch 5 0.0 0 job_queue_processes parameter 1 0.0 0 latch wait list 1,434 1.3 0.1 1,425 0.4 library cache 63,432,885 0.0 0.2 431 3.9 library cache load lock 1,589 2.3 0.0 0 list of block allocation 129 0.0 0 loader state object freelist 6 0.0 0 longop free list 2 0.0 0 messages 464 0.0 0 mostly latch-free SCN 139 0.0 0 ncodef allocation latch 1 0.0 0 process allocation 5 0.0 5 0.0 process group creation 17 0.0 0 redo allocation 882 0.2 0.0 0 redo writing 357 0.0 0 row cache objects 17,425 0.6 0.0 8 0.0 sequence cache 57 0.0 0 session allocation 5,417 0.0 0.0 0 session idle bit 1,140 0.0 0 session switching 1 0.0 0 shared java pool 47 0.0 0 shared pool 21,489 0.2 0.2 0 sort extent pool 12 0.0 0 transaction allocation 194 0.0 0 transaction branch allocation 1 0.0 0 undo global data 339 0.0 0 user lock 222 0.0 0 ------------------------------------------------------------- Latch Sleep breakdown for DB: OLIBPR Instance: olibpr Snaps: 16114 -16116 -> ordered by misses desc Get Spin & Latch Name Requests Misses Sleeps Sleeps 1->4 -------------------------- -------------- ----------- ----------- ------------ library cache 63,432,885 9,048 1,723 7666/1065/29 5/22/0 cache buffers chains 254,304 180 14 168/10/2/0/0 row cache objects 17,425 102 2 100/2/0/0/0 shared pool 21,489 49 11 44/2/2/1/0 latch wait list 1,434 19 1 18/1/0/0/0 JOX SGA heap latch 869 1 1 0/1/0/0/0 ------------------------------------------------------------- Latch Miss Sources for DB: OLIBPR Instance: olibpr Snaps: 16114 -16116 -> only latches with sleeps are shown -> ordered by name, sleeps desc NoWait Waiter Latch Name Where Misses Sleeps Sleeps ------------------------ -------------------------- ------- ---------- ------- cache buffers chains kcbgtcr: kslbegin 0 13 9 cache buffers chains kcbrls: kslbegin 0 1 5 latch wait list No latch 0 1 1 latch wait list kslfre 2 1 1 library cache kglic 0 690 1,551 library cache kglpnal: child: before pro 0 308 14 library cache kglpnal: child: alloc spac 0 211 18 library cache kglhdgn: child: 0 156 23 library cache kglobpn: child: 0 78 19 library cache kglhdgh: child: 0 32 24 library cache kgldti: 2child 0 31 0 library cache kglhdgc: child: 0 31 5 library cache kgllkdl: child: cleanup 0 26 12 library cache kglrtl 0 23 0 library cache kglupc: child 0 16 12 library cache kgldtld: 2child 0 15 0 library cache kglati 0 13 7 library cache kglpin 0 13 3 library cache kglget: child: KGLDSBRD 0 7 0 library cache kgllkdl: child: free pin 0 5 10 library cache kglget: child: KGLDSBYD 0 4 1 library cache kglpnc: child 0 4 1 library cache kglpnal: child: check gran 0 3 1 library cache kgldtld: parent 0 1 0 library cache kglini: parent 0 1 0 library cache kglpnp: child 0 1 2 row cache objects kqrpre: find obj 0 2 0 shared pool kghfrunp: clatch: nowait 0 32 0 shared pool kghfrunp: clatch: wait 0 19 3 shared pool kghfrunp: alloc: clatch no 0 5 0 shared pool kghfrunp: alloc: wait 0 4 0 shared pool kghalp 0 1 0 shared pool kghupr1 0 1 2 ------------------------------------------------------------- Dictionary Cache Stats for DB: OLIBPR Instance: olibpr Snaps: 16114 -16116 ->"Pct Misses" should be very low (< 2% in most cases) ->"Cache Usage" is the number of cache entries being used ->"Pct SGA" is the ratio of usage to allocated size for that cache Get Pct Scan Pct Mod Final Pct Cache Requests Miss Requests Miss Req Usage SGA ---------------------- ------------ ------ -------- ----- -------- ------ ---- dc_constraints 0 0 0 0 0 dc_database_links 0 0 0 0 0 dc_files 0 0 0 10 91 dc_free_extents 0 0 0 11 92 dc_global_oids 0 0 0 5 83 dc_histogram_data 0 0 0 0 0 dc_histogram_data_valu 0 0 0 0 0 dc_histogram_defs 1,614 17.0 0 0 507 100 dc_object_ids 1,090 6.2 0 0 183 96 dc_objects 860 15.3 0 0 310 99 dc_outlines 0 0 0 0 0 dc_profiles 22 0.0 0 0 2 40 dc_rollback_segments 0 0 0 17 94 dc_segments 414 31.6 0 0 123 97 dc_sequence_grants 0 0 0 1 50 dc_sequences 19 0.0 0 14 8 67 dc_synonyms 88 25.0 0 0 52 93 dc_tablespace_quotas 0 0 0 1 50 dc_tablespaces 4 25.0 0 0 2 20 dc_used_extents 0 0 0 5 83 dc_user_grants 20 0.0 0 0 16 42 dc_usernames 572 0.0 0 0 12 52 dc_users 951 0.0 0 0 30 77 ifs_acl_cache_entries 0 0 0 0 0 ------------------------------------------------------------- Library Cache Activity for DB: OLIBPR Instance: olibpr Snaps: 16114 -16116 ->"Pct Misses" should be very low Get Pct Pin Pct Invali- Namespace Requests Miss Requests Miss Reloads dations --------------- ------------ ------ -------------- ------ ---------- -------- BODY 81 1.2 83 8.4 5 0 CLUSTER 277 0.4 32 6.3 0 0 INDEX 0 0 0 0 OBJECT 0 0 0 0 PIPE 0 0 0 0 SQL AREA 1,500 10.7 22,161 0.8 72 8 TABLE/PROCEDURE 2,160 3.4 7,376 10.2 271 0 TRIGGER 12 0.0 12 0.0 0 0 ------------------------------------------------------------- SGA Memory Summary for DB: OLIBPR Instance: olibpr Snaps: 16114 -16116 SGA regions Size in Bytes ------------------------------ ---------------- Database Buffers 104,857,600 Fixed Size 104,936 Redo Buffers 77,824 Variable Size 102,891,520 ---------------- sum 207,931,880 <END OF EXTRACT>
From: DA Morgan on 29 Jun 2005 15:04 norman.dunbar(a)lfs.co.uk wrote: > Morning all (or afternoon - I'm behind another firewall and using > Google !) > > Oracle enterprise : 8.1.7.3 (there's nothing I can do about this I'm > afraid !) What prevents patching to 8.1.7.4? -- Daniel A. Morgan http://www.psoug.org damorgan(a)x.washington.edu (replace x with u to respond)
From: Joel Garry on 29 Jun 2005 20:46 norman.dunbar(a)lfs.co.uk wrote: > Morning all (or afternoon - I'm behind another firewall and using > Google !) > > Oracle enterprise : 8.1.7.3 (there's nothing I can do about this I'm > afraid !) > HP-UX 11.11 > 8 CPUs and load average is 0.5 over 45 days. > > > Sessions hang on library cache load lock, all are executing the same > sql statement which makes a call to UTL_TCP.flush followed by > utl_tcp.close_connectioon (see below). This is intermittants but very > irritating for my users. > > I've Googled these groups and Metalinked till I'm blue in the face - > nothing. :o( Well, there's this old thing, which will probably tell you what you already know: spool tuning_stats.txt ttitle 'SYSTEM STATISTICS FOR ORACLE7' select 'LIBRARY CACHE STATISTICS:' from dual; ttitle off select 'PINS - # of times an item in the library cache was executed - '|| sum(pins), 'RELOADS - # of library cache misses on execution steps - '|| sum (reloads)||chr(10)||chr(10), 'RELOADS / PINS * 100 = '||round((sum(reloads) / sum(pins) * 100),2)||'%' from v$librarycache / prompt Increase memory until RELOADS is near 0 but watch out for prompt Paging/swapping prompt To increase library cache, increase SHARED_POOL_SIZE prompt prompt ** NOTE: Increasing SHARED_POOL_SIZE will increase the SGA size. prompt prompt Library Cache Misses indicate that the Shared Pool is not big prompt enough to hold the shared SQL area for all concurrently open cursors. prompt If you have no Library Cache misses (PINS = 0), you may get a small prompt increase in performance by setting CURSOR_SPACE_FOR_TIME = TRUE which prompt prevents ORACLE from deallocating a shared SQL area while an prompt application prompt cursor associated with it is open. prompt prompt For Multi-threaded server, add 1K to SHARED_POOL_SIZE per user. prompt prompt------------------------------------------------------------------------ > > > Gruesome details. > > User logs into an application - it sometimes hangs. > User calls me up to advise me of the hang, in case I don't already > know. > I log into application - it hangs for me too. (I'm on a different > network etc). > I use SQLPlus and get an instant connection - my session, and that of > my user(s) stay hung. Hence problem must be in the application. > > My script to interrogate V$SESSION_WAIT & V$SESSION shows the following > : > > SQL> @sessionwait > > SID USERNAME OSUSER SQL_HASH_VALUE EVENT > > ------------ --------------- --------------- -------------- > ----------------------- > 29 WV20 www2 1359669396 library > cache load lock > 68 WV20 landrews 1359669396 library > cache load lock > 70 WV20 www2 1359669396 library > cache load lock > 79 WV20 www2 1359669396 library > cache load lock > 90 WV20 www2 1359669396 library > cache load lock > 114 WV20 www2 1359669396 library > cache load lock > > 6 rows selected. > > www2 is a web interface (written in Java) to the application via > Apache. > > The application also seems to login three sessions to the database for > every actual execution of the application. Hmm. Wild guess - the thing is shooting itself in the foot requesting something in the shared pool that it itself is causing to be reloaded. > > > Everyone is waiting on the same sql_hash_value. This is a simple > statement : > > SQL> @sqlbyhash_8i > > Enter value for hash_value: 1359669396 > > SQL_TEXT > ---------------------------------------------------------------- > begin -- if there are errors, there's nothing we can do :( b > egin fdi_norm.Tidy; exception when others then null; > end; end; > > Now, I've tracked the above code to one place, in a logoff trigger for > a specific schema that we are not (directly) logging into - it is > possible that the application is logging into it in the background > though. The comments helped in tracing the code ! > > The code being executed, FDI_NORM.TIDY, can and is executed from > elsewhere though. I've never caught it hanging when called from > elsewhere, it's always from the above statement which is in the > trigger. > > The actual code being executed from the trigger looks like this : > > procedure Tidy is > begin > utl_tcp.flush(c); > utl_tcp.close_connection(c); > Connected := false; > SessionID := -1; > exception > when OTHERS then null; > end Tidy; > > I have both FDI_NORM and UTL_TCP pinned - it makes no difference. (By > the way, FDI_NORM has nothing to do with me, only the names are the > same - this code is from a third party application.) TCP stuff often has timeouts. Do a netstat -a |grep FIN_WAIT . If you see lots of FIN_WAIT2, google for that, you may need to play with hp-ux timeout settings. When that happens, it's like a slow drain in the bathtub for it to go away. And you can then blame the networking folk. Check to be sure what you think is pinned really is. > > Now I've tested this procedure in isolation when the system is running > fine - it takes 2.5 seconds on the first run and zero seconds on all > subsequent runs during this session. > > When the system is hanging, it takes around 4 minutes to run and times > out. The connection is to another server here and that server is > running happily as is the network - we've had it checked by out > networking guys & gals. > jg -- @home.com is bogus. "It exempts - it is everything what it could bleed This is why it never goes will be White - naked naked to the night Alone looking for for some game little Has another girl who wants to conduct the world Any hour or place and when it enters in its head You knows it is there for being You want this It you are acquiring this Chambers of molly you iram to change its mind It you are acquiring Its pistol Slow - It is burning in its soul With whispers in its ear Is certain I will give this in any way Has little me arrives outside daqui You will plead - you low in its knees For have little another taste well and when you think that they leave to it inside This Is when it to diminish You want this It you are acquiring this Chambers of molly you iram to change its mind It you are acquiring Its pistol You want this It you are acquiring this Chambers of molly you iram to change its mind It you are acquiring Its pistol" - Bablefish translation of Portugese version of Molly's Chambers.
From: norman.dunbar on 30 Jun 2005 05:44 Morning Daniel, it's a third party app and not validated for 8174. We are in the processes of migrating a number of apps to 9i even as I type, but we have a number of legacy apps still on 7.3.4 and 8.1.7.0-8.1.7.4 and even one at 8.0.4 ! Cheers, Norm.
From: DA Morgan on 30 Jun 2005 10:09 norman.dunbar(a)lfs.co.uk wrote: > Morning Daniel, > > it's a third party app and not validated for 8174. We are in the > processes of migrating a number of apps to 9i even as I type, but we > have a number of legacy apps still on 7.3.4 and 8.1.7.0-8.1.7.4 and > even one at 8.0.4 ! > > Cheers, > Norm. You have my sympathy. But I've never heard of anyone validating to 8.1.7.3 and not 8.1.7.4. The usual validation is to 8iR3 meaning 8.1.7 without specifying the final digit. It might be useful to name names. In other words publish the names of any vendor that is forcing you to stay with Oracle 7.x or 8.0.x so that others will know that this vendor does not give a damn about its customers and hopefully will stay clear of future purhases from them. -- Daniel A. Morgan http://www.psoug.org damorgan(a)x.washington.edu (replace x with u to respond)
|
Next
|
Last
Pages: 1 2 Prev: ORA-1113 signalled during: alter database open Next: Oracle 10 RAC to 10G RAC standby |