Prev: ZFS pool
Next: snv_134 Zpool Failure (again)
From: Nelly Boy on 26 Mar 2010 13:57 Thanks for the responses so far. Heres the info from the messages log from the start of the panic to the start of the reboot: Mar 24 08:31:31 hostname unix: [ID 836849 kern.notice] Mar 24 08:31:31 hostname ^Mpanic[cpu0]/thread=30005388c00: Mar 24 08:31:31 hostname unix: [ID 920532 kern.notice] page_unlock: page 310071f14c0 is not locked Mar 24 08:31:31 hostname unix: [ID 100000 kern.notice] Mar 24 08:31:31 hostname genunix: [ID 723222 kern.notice] 000002a10089d2b0 unix:page_unlock+d8 (1041c878, 310071f14c0, ffbea000, 1, 3000016aa18, ffbe0001) Mar 24 08:31:31 hostname genunix: [ID 179002 kern.notice] %l0-3: 00000310010ab4c0 0000031001ed8ff8 000000b0f1ad8ff8 000000b0f2c09c08 Mar 24 08:31:31 hostname %l4-7: 000000b0f2c09c08 0000000000000000 0000031003009c08 0000031003009c08 Mar 24 08:31:31 hostname genunix: [ID 723222 kern.notice] 000002a10089d360 unix:page_release+134 (310071f14c0, 1, 2000, 0, 2a10089d4f0, 2000) Mar 24 08:31:32 hostname genunix: [ID 179002 kern.notice] %l0-3: 000000001045ee30 0000000000000000 0000000000000000 0000000000000000 Mar 24 08:31:32 hostname %l4-7: 00600112a7970000 00000300087d4e18 00000310071f14c0 0000000000000001 Mar 24 08:31:32 hostname genunix: [ID 723222 kern.notice] 000002a10089d410 genunix:anon_private+1b8 (310078394c0, 2000, 30008bf23a0, 310071f14c0, 30008953cb8, 0) Mar 24 08:31:32 hostname genunix: [ID 179002 kern.notice] %l0-3: 000000001010fe24 000002a10089d610 0000030005dacda0 00000000ffbe8000 Mar 24 08:31:32 hostname %l4-7: 000000000000000f 0000030008c82200 0000000000000000 0000000000000000 Mar 24 08:31:33 hostname genunix: [ID 723222 kern.notice] 000002a10089d520 genunix:segvn_faultpage+7dc (30008953cb8, 30005dacda0, 7, 0, 0, 1) Mar 24 08:31:33 hostname genunix: [ID 179002 kern.notice] %l0-3: 000003000016aa18 0000000000000002 000003000574a5b8 0000030005e1ecd8 Mar 24 08:31:33 hostname %l4-7: 00000000ffbe8000 ffffffffffffa000 000000000000000f 00000310071f14c0 Mar 24 08:31:33 hostname genunix: [ID 723222 kern.notice] 000002a10089d620 genunix:segvn_fault+860 (0, ffbea000, ffffffffffffa000, 1, 2, ffbe8000) Mar 24 08:31:33 hostname genunix: [ID 179002 kern.notice] %l0-3: 00000000ffbe8000 0000000000002000 0000030005dacda0 000003000574a5b8 Mar 24 08:31:33 hostname %l4-7: 000002a10089d7a8 0000000000000000 0000030005e1ecd8 000000000000ffff Mar 24 08:31:34 hostname genunix: [ID 723222 kern.notice] 000002a10089d7f0 genunix:as_fault+3a4 (1, ffbe8000, 300053754f0, 2, 1, 0) Mar 24 08:31:34 hostname genunix: [ID 179002 kern.notice] %l0-3: 00000000100c5498 000003000016aa18 0000030005d9f810 0000000000002000 Mar 24 08:31:34 hostname %l4-7: 00000000ffbe8000 00000000ffbe8000 0000030005dacda0 0000000000002000 Mar 24 08:31:34 hostname genunix: [ID 723222 kern.notice] 000002a10089d8f0 unix:pagefault+c4 (2, 0, 3000532c050, 30005d9f810, ffbe8000, 0) Mar 24 08:31:34 hostname genunix: [ID 179002 kern.notice] %l0-3: 0000000010110a68 0000030002d63728 000003000305f318 000002a100013d20 Mar 24 08:31:34 hostname %l4-7: 00000000102a4604 0000000000000000 0000000002ae196c 0000000000000001 Mar 24 08:31:35 hostname genunix: [ID 723222 kern.notice] 000002a10089d9b0 unix:trap+c60 (ffbe88ce, 5, ffbe8000, 10000, 2a10089dba0, 0) Mar 24 08:31:35 hostname genunix: [ID 179002 kern.notice] %l0-3: 00000000fec74654 0000000000000000 000003000532c050 0000000000000000 Mar 24 08:31:35 hostname %l4-7: 0000000000010033 00000300053754f0 0000000000000001 0000000000000002 Mar 24 08:31:35 hostname unix: [ID 100000 kern.notice] Mar 24 08:31:35 hostname genunix: [ID 672855 kern.notice] syncing file systems... Mar 24 08:31:37 hostname md_stripe: [ID 641072 kern.warning] WARNING: md: d13: write error on /dev/dsk/c2t0d0s3 Mar 24 08:31:37 hostname md_stripe: [ID 641072 kern.warning] WARNING: md: d23: write error on /dev/dsk/c2t1d0s3 Mar 24 08:31:38 hostname genunix: [ID 733762 kern.notice] 8 Mar 24 08:31:39 hostname genunix: [ID 733762 kern.notice] 3 Mar 24 08:31:40 hostname genunix: [ID 733762 kern.notice] 1 Mar 24 08:31:53 hostname last message repeated 8 times Mar 24 08:31:53 hostname genunix: [ID 616637 kern.notice] cannot sync -- giving up Mar 24 08:31:54 hostname genunix: [ID 353387 kern.notice] dumping to / dev/md/dsk/d1, offset 1677983744 Mar 24 08:33:01 hostname genunix: [ID 409368 kern.notice] ^M100% done: 78076 pages dumped, compression ratio 2.75, Mar 24 08:33:01 hostname genunix: [ID 851671 kern.notice] dump succeeded Mar 24 08:34:00 hostname genunix: [ID 540533 kern.notice] ^MSunOS Release 5.8 Version Generic_108528-20 64-bit Mar 24 08:34:00 hostname genunix: [ID 913632 kern.notice] Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved. Heres the panic info from the crash dump: SolarisCAT(vmcore.0/8U)> panic panic on cpu 0 panic string: page_unlock: page 310071f14c0 is not locked ==== panic user (LWP_SYS) thread: 0x30005388c00 PID: 478 on CPU: 0 ==== cmd: /u01/oracle/product/9.2.0.1.0/bin/tnslsnr LISTENER -inherit t_procp: 0x3000532c050 p_as: 0x30005d9f810 size: 15187968 RSS: 3465216 hat: 0x3000016aa18 cnum: 0x8ce cpusran: 0,1,2,3 t_stk: 0x2a10089daf0 sp: 0x10423081 t_stkbase: 0x2a10089a000 t_pri: 15(TS) pctcpu: 0.101943 t_lwp: 0x300053754f0 machpcb: 0x2a10089daf0 psrset: 0 last CPU: 0 idle: 0 ticks (0 seconds) start: Sun Mar 21 20:36:08 2010 age: 215723 seconds (2 days 11 hours 55 minutes 23 seconds) tstate: TS_ONPROC - thread is being run on a processor tflg: T_PANIC - thread initiated a system panic tpflg: TP_TWAIT - wait to be freed by lwp_wait tsched: TS_LOAD - thread is in memory TS_DONT_SWAP - thread/LWP should not be swapped pflag: SLOAD - in core SULOAD - u-block in core pc: unix:panicsys+0x44: call unix:setjmp unix:panicsys+0x44(0x10054540, 0x2a10089d338, 0x10423a50, 0x1, 0x8, , 0x9900001601, , , , , , , , 0x10054540, 0x2a10089d338) unix:vpanic+0xcc(0x10054540, 0x2a10089d338, 0x0, 0x0, 0x0, 0x0) unix:panic+0x1c(0x10054540, 0x310071f14c0, 0x20, 0x31003009c38, 0x31003009c3a, 0x2000) unix:page_unlock+0xd8(0x310071f14c0, , 0xffbea000, 0x1, 0x3000016aa18, 0xffbe0001) unix:page_release+0x134(0x310071f14c0, 0x1, 0x2000, 0x0, 0x2a10089d4f0, 0x2000) genunix:anon_private+0x1b8(0x2a10089d610, 0x30005dacda0, 0xffbe8000, 0xf, 0x310071f14c0, 0x0) genunix:segvn_faultpage+0x7dc(0x3000016aa18, 0x30005dacda0, 0xffbe8000, 0xffffffffffffa000, 0x0, 0x2a10089d7a8) genunix:segvn_fault+0x860(0x3000016aa18, 0x30005dacda0, 0xffbe8000, 0x2000, 0x1, 0x2) genunix:as_fault+0x3a4(0x3000016aa18?, 0x30005d9f810, 0xffbe8000, 0x1, 0x1, 0x2?) unix:pagefault+0xc4(0xffbe8000?, 0x1, 0x2, 0x0, , 0x0) unix:trap+0xc60(0x2a10089dba0?, 0xffbe8000, 0x10033?, 0xffbe8000?) unix:user_rtt+0x0() -- trap data type: 0x10033 (USER + data access protection - page was write protected) rp: 0x2a10089dba0 -- pc: 0xfec74654 (userland) npc: 0xfec74774 (userland) global: %g1 0xfec74644 %g2 0xffffffffffffffff %g3 0x1a44a0 %g4 0xffbe8cc0 %g5 0 %g6 0 %g7 0 out: %o0 0x126c %o1 0 %o2 0xffbe8664 %o3 0x550 %o4 0x22408 %o5 0xfec74644 %sp 0xffbe8128 %o7 0xfec74644 -- switch to user thread's user stack -- SolarisCAT(vmcore.0/8U)> Thanks Nelly Boy
From: Wolfgang Ley on 27 Mar 2010 21:51 Hi, Nelly Boy wrote: > Thanks for the responses so far. > > Heres the info from the messages log from the start of the panic to > the start of the reboot: > > Mar 24 08:31:31 hostname unix: [ID 836849 kern.notice] > Mar 24 08:31:31 hostname ^Mpanic[cpu0]/thread=30005388c00: > Mar 24 08:31:31 hostname unix: [ID 920532 kern.notice] page_unlock: > page 310071f14c0 is not locked [...] > > unix:panicsys+0x44(0x10054540, 0x2a10089d338, 0x10423a50, 0x1, 0x8, , > 0x9900001601, , , , , , , , 0x10054540, 0x2a10089d338) > unix:vpanic+0xcc(0x10054540, 0x2a10089d338, 0x0, 0x0, 0x0, 0x0) > unix:panic+0x1c(0x10054540, 0x310071f14c0, 0x20, 0x31003009c38, > 0x31003009c3a, 0x2000) > unix:page_unlock+0xd8(0x310071f14c0, , 0xffbea000, 0x1, 0x3000016aa18, > 0xffbe0001) > unix:page_release+0x134(0x310071f14c0, 0x1, 0x2000, 0x0, > 0x2a10089d4f0, 0x2000) > genunix:anon_private+0x1b8(0x2a10089d610, 0x30005dacda0, 0xffbe8000, > 0xf, 0x310071f14c0, 0x0) > genunix:segvn_faultpage+0x7dc(0x3000016aa18, 0x30005dacda0, > 0xffbe8000, 0xffffffffffffa000, 0x0, 0x2a10089d7a8) > genunix:segvn_fault+0x860(0x3000016aa18, 0x30005dacda0, 0xffbe8000, > 0x2000, 0x1, 0x2) > genunix:as_fault+0x3a4(0x3000016aa18?, 0x30005d9f810, 0xffbe8000, 0x1, > 0x1, 0x2?) > unix:pagefault+0xc4(0xffbe8000?, 0x1, 0x2, 0x0, , 0x0) > unix:trap+0xc60(0x2a10089dba0?, 0xffbe8000, 0x10033?, 0xffbe8000?) > unix:user_rtt+0x0() This stacktrace is far too generic and therefore not really helpful here. Can you please provide the stack traces of the panics with the mutex panic strings? Thanks. If the mutex related panics are no longer available then you may want to enable kernel memory debugging to see whether this reveals more information on the next panic. This can be done be adding the following line to /etc/system and rebooting the system to activate the new setting: set kmem_flags=0x2f Bye, Wolfgang.
From: Nelly Boy on 28 Mar 2010 15:35 Heres the stack trace from the mutex_destroy panic trace. Thanks Nelly Boy SolarisCAT(vmcore.3/8U)> panic panic on cpu 2 panic string: mutex_destroy: bad mutex, lp=30003077de0 owner=2a10061dd20 thread=2a100605d20 ==== panic kernel thread: 0x2a100605d20 PID: 0 on CPU: 2 ==== cmd: sched t_procp: 0x10423e20(proc_sched) p_as: 0x10423d30(kas) t_stk: 0x2a100605b10 sp: 0x10423081 t_stkbase: 0x2a100602000 t_pri: 60(SYS) pctcpu: 0.000000 t_lwp: 0x0 psrset: 0 last CPU: 2 idle: 3 ticks (0.03 seconds) start: Wed Mar 24 08:34:13 2010 age: 22039 seconds (6 hours 7 minutes 19 seconds) tstate: TS_ONPROC - thread is being run on a processor tflg: T_TALLOCSTK - thread structure allocated from stk T_PANIC - thread initiated a system panic tpflg: none set tsched: TS_LOAD - thread is in memory TS_DONT_SWAP - thread/LWP should not be swapped pflag: SSYS - system resident process SLOAD - in core SLOCK - process cannot be swapped SULOAD - u-block in core pc: unix:panicsys+0x44: call unix:setjmp startpc: ce:ce_drain_fifo+0x0: save %sp, -0xc0, %sp unix:panicsys+0x44(0x100544a8, 0x2a100605288, 0x10423a50, 0x1, 0x30004c54f40, , 0x4400001603, , , , , , , , 0x100544a8, 0x2a100605288) unix:vpanic+0xcc(0x100544a8, 0x2a100605288, 0x2, 0x2, 0x30003076008, 0x30004c54f40) unix:panic+0x1c(0x100544a8, 0x10415ee8, 0x30003077de0, 0x2a10061dd20, 0x2a100605d20, 0x30002d59f00) unix:mutex_panic+0x5c(0x10415ee8, 0x30003077de0, 0x1, 0x1, 0x3000007cd80, 0x3000007ceb0) unix:mutex_destroy(0x30003077de0) - frame recycled ip:ire_inactive+0xec(0x30003077cc8?, , , 0x0, 0x3000a8fda40, 0x0) ip:ire_refrele(0x30003077cc8) - frame recycled ip:icmp_pkt_err_ok+0x230(, , , , 0x0, 0x0) ip:icmp_unreachable+0x30(0x30003060818?, 0x30004c54f40?, , , 0x0, 0x0) ip:ip_fanout_send_icmp(0x30003060738, , 0x5, 0x1047644c, 0x3, 0x3) - frame recycled ip:ip_fanout_udp+0xe50(0x30003060738, 0x30004c54f40, 0x3000305fc28, 0x3000437c050, 0x10472920?) ip:ip_rput_local+0x16c0(0x30003060738?, 0x30004c54f40, 0x3000437c050, 0x30003076008, 0x0) ip:ip_rput+0x12c4(0x148?, 0x300070744c0) unix:putnext+0x1cc(0x30002db4990, 0x300070744c0?) ce:ce_putnext_sap+0x2f4(0x30003067b10, , 0x30004c54f40, 0x30002d3d668, 0x1, 0x0) ce:ce_send_up+0x900(0x30003067b10, 0x30004c54f40, 0x0, , , 0x0) ce:ce_drain_fifo+0x40(0x300044a9a68, 0x0, 0x10423e20, 0x10423e20, 0x2, 0x0) unix:thread_start+0x4() -- end of kernel thread's stack -- SolarisCAT(vmcore.3/8U)>
From: Richard B. Gilbert on 28 Mar 2010 20:27 Wolfgang Ley wrote: > Hi, > > Nelly Boy wrote: >> Hi >> >> We have a 480R server running Solaris 8 (108528-20) and over the last >> few months the server has been crashing. >> > > Apart from that: consider to move to a more recent Solaris release. > Solaris 8 is really old. > The age of the software is probably not the issue here. Solaris 8 is quite old but many of us here are older still! ;-) The real issue with running S8 is that you can't get support from Sun. If the OP upgrades to S10 and has the same problem, he will at least be able to get support.
From: Chris Ridd on 29 Mar 2010 02:13
On 2010-03-28 18:08:26 +0100, Canuck57 said: > If he has been running smoothly for 7 years, patching isn't the root > cause. Nor would I recommend just patching up unless the more precise > cause was known to be fixed by a patch. > > No doubt something changed in SW somewhere. In at least the way it is > used or whatever. Given that Oracle was running on the panicing CPU according to the crash dump output sent on 26 March... has Oracle changed? -- Chris |