Panic strings [Solaris]

Prev: ZFS pool
Next: snv_134 Zpool Failure (again)

From: Nelly Boy on 26 Mar 2010 13:57

Thanks for the responses so far.

Heres the info from the messages log from the start of the panic to
the start of the reboot:

Mar 24 08:31:31 hostname unix: [ID 836849 kern.notice]
Mar 24 08:31:31 hostname ^Mpanic[cpu0]/thread=30005388c00:
Mar 24 08:31:31 hostname unix: [ID 920532 kern.notice] page_unlock:
page 310071f14c0 is not locked
Mar 24 08:31:31 hostname unix: [ID 100000 kern.notice]
Mar 24 08:31:31 hostname genunix: [ID 723222 kern.notice]
000002a10089d2b0 unix:page_unlock+d8 (1041c878, 310071f14c0, ffbea000,
1, 3000016aa18, ffbe0001)
Mar 24 08:31:31 hostname genunix: [ID 179002 kern.notice] %l0-3:
00000310010ab4c0 0000031001ed8ff8 000000b0f1ad8ff8 000000b0f2c09c08
Mar 24 08:31:31 hostname %l4-7: 000000b0f2c09c08 0000000000000000
0000031003009c08 0000031003009c08
Mar 24 08:31:31 hostname genunix: [ID 723222 kern.notice]
000002a10089d360 unix:page_release+134 (310071f14c0, 1, 2000, 0,
2a10089d4f0, 2000)
Mar 24 08:31:32 hostname genunix: [ID 179002 kern.notice] %l0-3:
000000001045ee30 0000000000000000 0000000000000000 0000000000000000
Mar 24 08:31:32 hostname %l4-7: 00600112a7970000 00000300087d4e18
00000310071f14c0 0000000000000001
Mar 24 08:31:32 hostname genunix: [ID 723222 kern.notice]
000002a10089d410 genunix:anon_private+1b8 (310078394c0, 2000,
30008bf23a0, 310071f14c0, 30008953cb8,
0)
Mar 24 08:31:32 hostname genunix: [ID 179002 kern.notice] %l0-3:
000000001010fe24 000002a10089d610 0000030005dacda0 00000000ffbe8000
Mar 24 08:31:32 hostname %l4-7: 000000000000000f 0000030008c82200
0000000000000000 0000000000000000
Mar 24 08:31:33 hostname genunix: [ID 723222 kern.notice]
000002a10089d520 genunix:segvn_faultpage+7dc (30008953cb8,
30005dacda0, 7, 0, 0, 1)
Mar 24 08:31:33 hostname genunix: [ID 179002 kern.notice] %l0-3:
000003000016aa18 0000000000000002 000003000574a5b8 0000030005e1ecd8
Mar 24 08:31:33 hostname %l4-7: 00000000ffbe8000 ffffffffffffa000
000000000000000f 00000310071f14c0
Mar 24 08:31:33 hostname genunix: [ID 723222 kern.notice]
000002a10089d620 genunix:segvn_fault+860 (0, ffbea000,
ffffffffffffa000, 1, 2, ffbe8000)
Mar 24 08:31:33 hostname genunix: [ID 179002 kern.notice] %l0-3:
00000000ffbe8000 0000000000002000 0000030005dacda0 000003000574a5b8
Mar 24 08:31:33 hostname %l4-7: 000002a10089d7a8 0000000000000000
0000030005e1ecd8 000000000000ffff
Mar 24 08:31:34 hostname genunix: [ID 723222 kern.notice]
000002a10089d7f0 genunix:as_fault+3a4 (1, ffbe8000, 300053754f0, 2, 1,
0)
Mar 24 08:31:34 hostname genunix: [ID 179002 kern.notice] %l0-3:
00000000100c5498 000003000016aa18 0000030005d9f810 0000000000002000
Mar 24 08:31:34 hostname %l4-7: 00000000ffbe8000 00000000ffbe8000
0000030005dacda0 0000000000002000
Mar 24 08:31:34 hostname genunix: [ID 723222 kern.notice]
000002a10089d8f0 unix:pagefault+c4 (2, 0, 3000532c050, 30005d9f810,
ffbe8000, 0)
Mar 24 08:31:34 hostname genunix: [ID 179002 kern.notice] %l0-3:
0000000010110a68 0000030002d63728 000003000305f318 000002a100013d20
Mar 24 08:31:34 hostname %l4-7: 00000000102a4604 0000000000000000
0000000002ae196c 0000000000000001
Mar 24 08:31:35 hostname genunix: [ID 723222 kern.notice]
000002a10089d9b0 unix:trap+c60 (ffbe88ce, 5, ffbe8000, 10000,
2a10089dba0, 0)
Mar 24 08:31:35 hostname genunix: [ID 179002 kern.notice] %l0-3:
00000000fec74654 0000000000000000 000003000532c050 0000000000000000
Mar 24 08:31:35 hostname %l4-7: 0000000000010033 00000300053754f0
0000000000000001 0000000000000002
Mar 24 08:31:35 hostname unix: [ID 100000 kern.notice]
Mar 24 08:31:35 hostname genunix: [ID 672855 kern.notice] syncing file
systems...
Mar 24 08:31:37 hostname md_stripe: [ID 641072 kern.warning] WARNING:
md: d13: write error on /dev/dsk/c2t0d0s3
Mar 24 08:31:37 hostname md_stripe: [ID 641072 kern.warning] WARNING:
md: d23: write error on /dev/dsk/c2t1d0s3
Mar 24 08:31:38 hostname genunix: [ID 733762 kern.notice] 8
Mar 24 08:31:39 hostname genunix: [ID 733762 kern.notice] 3
Mar 24 08:31:40 hostname genunix: [ID 733762 kern.notice] 1
Mar 24 08:31:53 hostname last message repeated 8 times
Mar 24 08:31:53 hostname genunix: [ID 616637 kern.notice] cannot sync
-- giving up
Mar 24 08:31:54 hostname genunix: [ID 353387 kern.notice] dumping to /
dev/md/dsk/d1, offset 1677983744
Mar 24 08:33:01 hostname genunix: [ID 409368 kern.notice] ^M100% done:
78076 pages dumped, compression ratio 2.75,
Mar 24 08:33:01 hostname genunix: [ID 851671 kern.notice] dump
succeeded
Mar 24 08:34:00 hostname genunix: [ID 540533 kern.notice] ^MSunOS
Release 5.8 Version Generic_108528-20 64-bit
Mar 24 08:34:00 hostname genunix: [ID 913632 kern.notice] Copyright
1983-2003 Sun Microsystems, Inc. All rights reserved.

Heres the panic info from the crash dump:

SolarisCAT(vmcore.0/8U)> panic
panic on cpu 0
panic string: page_unlock: page 310071f14c0 is not locked
==== panic user (LWP_SYS) thread: 0x30005388c00 PID: 478 on CPU: 0
====
cmd: /u01/oracle/product/9.2.0.1.0/bin/tnslsnr LISTENER -inherit
t_procp: 0x3000532c050
p_as: 0x30005d9f810 size: 15187968 RSS: 3465216
hat: 0x3000016aa18 cnum: 0x8ce
cpusran: 0,1,2,3
t_stk: 0x2a10089daf0 sp: 0x10423081 t_stkbase: 0x2a10089a000
t_pri: 15(TS) pctcpu: 0.101943
t_lwp: 0x300053754f0 machpcb: 0x2a10089daf0
psrset: 0 last CPU: 0
idle: 0 ticks (0 seconds)
start: Sun Mar 21 20:36:08 2010
age: 215723 seconds (2 days 11 hours 55 minutes 23 seconds)
tstate: TS_ONPROC - thread is being run on a processor
tflg: T_PANIC - thread initiated a system panic
tpflg: TP_TWAIT - wait to be freed by lwp_wait
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
pflag: SLOAD - in core
SULOAD - u-block in core

pc: unix:panicsys+0x44: call unix:setjmp

unix:panicsys+0x44(0x10054540, 0x2a10089d338, 0x10423a50, 0x1, 0x8, ,
0x9900001601, , , , , , , , 0x10054540, 0x2a10089d338)
unix:vpanic+0xcc(0x10054540, 0x2a10089d338, 0x0, 0x0, 0x0, 0x0)
unix:panic+0x1c(0x10054540, 0x310071f14c0, 0x20, 0x31003009c38,
0x31003009c3a, 0x2000)
unix:page_unlock+0xd8(0x310071f14c0, , 0xffbea000, 0x1, 0x3000016aa18,
0xffbe0001)
unix:page_release+0x134(0x310071f14c0, 0x1, 0x2000, 0x0,
0x2a10089d4f0, 0x2000)
genunix:anon_private+0x1b8(0x2a10089d610, 0x30005dacda0, 0xffbe8000,
0xf, 0x310071f14c0, 0x0)
genunix:segvn_faultpage+0x7dc(0x3000016aa18, 0x30005dacda0,
0xffbe8000, 0xffffffffffffa000, 0x0, 0x2a10089d7a8)
genunix:segvn_fault+0x860(0x3000016aa18, 0x30005dacda0, 0xffbe8000,
0x2000, 0x1, 0x2)
genunix:as_fault+0x3a4(0x3000016aa18?, 0x30005d9f810, 0xffbe8000, 0x1,
0x1, 0x2?)
unix:pagefault+0xc4(0xffbe8000?, 0x1, 0x2, 0x0, , 0x0)
unix:trap+0xc60(0x2a10089dba0?, 0xffbe8000, 0x10033?, 0xffbe8000?)
unix:user_rtt+0x0()
-- trap data type: 0x10033 (USER + data access protection - page was
write protected) rp: 0x2a10089dba0 --
pc: 0xfec74654 (userland)
npc: 0xfec74774 (userland)
global: %g1 0xfec74644
%g2 0xffffffffffffffff %g3 0x1a44a0
%g4 0xffbe8cc0 %g5 0
%g6 0 %g7 0
out: %o0 0x126c %o1 0
%o2 0xffbe8664 %o3 0x550
%o4 0x22408 %o5 0xfec74644
%sp 0xffbe8128 %o7 0xfec74644
-- switch to user thread's user stack --

SolarisCAT(vmcore.0/8U)>

Thanks

Nelly Boy

From: Wolfgang Ley on 27 Mar 2010 21:51

Hi,

Nelly Boy wrote:
> Thanks for the responses so far.
>
> Heres the info from the messages log from the start of the panic to
> the start of the reboot:
>
> Mar 24 08:31:31 hostname unix: [ID 836849 kern.notice]
> Mar 24 08:31:31 hostname ^Mpanic[cpu0]/thread=30005388c00:
> Mar 24 08:31:31 hostname unix: [ID 920532 kern.notice] page_unlock:
> page 310071f14c0 is not locked
[...]
>
> unix:panicsys+0x44(0x10054540, 0x2a10089d338, 0x10423a50, 0x1, 0x8, ,
> 0x9900001601, , , , , , , , 0x10054540, 0x2a10089d338)
> unix:vpanic+0xcc(0x10054540, 0x2a10089d338, 0x0, 0x0, 0x0, 0x0)
> unix:panic+0x1c(0x10054540, 0x310071f14c0, 0x20, 0x31003009c38,
> 0x31003009c3a, 0x2000)
> unix:page_unlock+0xd8(0x310071f14c0, , 0xffbea000, 0x1, 0x3000016aa18,
> 0xffbe0001)
> unix:page_release+0x134(0x310071f14c0, 0x1, 0x2000, 0x0,
> 0x2a10089d4f0, 0x2000)
> genunix:anon_private+0x1b8(0x2a10089d610, 0x30005dacda0, 0xffbe8000,
> 0xf, 0x310071f14c0, 0x0)
> genunix:segvn_faultpage+0x7dc(0x3000016aa18, 0x30005dacda0,
> 0xffbe8000, 0xffffffffffffa000, 0x0, 0x2a10089d7a8)
> genunix:segvn_fault+0x860(0x3000016aa18, 0x30005dacda0, 0xffbe8000,
> 0x2000, 0x1, 0x2)
> genunix:as_fault+0x3a4(0x3000016aa18?, 0x30005d9f810, 0xffbe8000, 0x1,
> 0x1, 0x2?)
> unix:pagefault+0xc4(0xffbe8000?, 0x1, 0x2, 0x0, , 0x0)
> unix:trap+0xc60(0x2a10089dba0?, 0xffbe8000, 0x10033?, 0xffbe8000?)
> unix:user_rtt+0x0()

This stacktrace is far too generic and therefore not really helpful
here.
Can you please provide the stack traces of the panics with the mutex
panic strings? Thanks.

If the mutex related panics are no longer available then you may
want to enable kernel memory debugging to see whether this reveals
more information on the next panic. This can be done be adding the
following line to /etc/system and rebooting the system to activate
the new setting:

set kmem_flags=0x2f

Bye,
Wolfgang.

From: Nelly Boy on 28 Mar 2010 15:35

Heres the stack trace from the mutex_destroy panic trace.

Thanks

Nelly Boy

SolarisCAT(vmcore.3/8U)> panic
panic on cpu 2
panic string: mutex_destroy: bad mutex, lp=30003077de0
owner=2a10061dd20 thread=2a100605d20
==== panic kernel thread: 0x2a100605d20 PID: 0 on CPU: 2 ====
cmd: sched
t_procp: 0x10423e20(proc_sched)
p_as: 0x10423d30(kas)
t_stk: 0x2a100605b10 sp: 0x10423081 t_stkbase: 0x2a100602000
t_pri: 60(SYS) pctcpu: 0.000000
t_lwp: 0x0 psrset: 0 last CPU: 2
idle: 3 ticks (0.03 seconds)
start: Wed Mar 24 08:34:13 2010
age: 22039 seconds (6 hours 7 minutes 19 seconds)
tstate: TS_ONPROC - thread is being run on a processor
tflg: T_TALLOCSTK - thread structure allocated from stk
T_PANIC - thread initiated a system panic
tpflg: none set
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
pflag: SSYS - system resident process
SLOAD - in core
SLOCK - process cannot be swapped
SULOAD - u-block in core

pc: unix:panicsys+0x44: call unix:setjmp
startpc: ce:ce_drain_fifo+0x0: save %sp, -0xc0, %sp

unix:panicsys+0x44(0x100544a8, 0x2a100605288, 0x10423a50, 0x1,
0x30004c54f40, , 0x4400001603, , , , , , , , 0x100544a8,
0x2a100605288)
unix:vpanic+0xcc(0x100544a8, 0x2a100605288, 0x2, 0x2, 0x30003076008,
0x30004c54f40)
unix:panic+0x1c(0x100544a8, 0x10415ee8, 0x30003077de0, 0x2a10061dd20,
0x2a100605d20, 0x30002d59f00)
unix:mutex_panic+0x5c(0x10415ee8, 0x30003077de0, 0x1, 0x1,
0x3000007cd80, 0x3000007ceb0)
unix:mutex_destroy(0x30003077de0) - frame recycled
ip:ire_inactive+0xec(0x30003077cc8?, , , 0x0, 0x3000a8fda40, 0x0)
ip:ire_refrele(0x30003077cc8) - frame recycled
ip:icmp_pkt_err_ok+0x230(, , , , 0x0, 0x0)
ip:icmp_unreachable+0x30(0x30003060818?, 0x30004c54f40?, , , 0x0, 0x0)
ip:ip_fanout_send_icmp(0x30003060738, , 0x5, 0x1047644c, 0x3, 0x3) -
frame recycled
ip:ip_fanout_udp+0xe50(0x30003060738, 0x30004c54f40, 0x3000305fc28,
0x3000437c050, 0x10472920?)
ip:ip_rput_local+0x16c0(0x30003060738?, 0x30004c54f40, 0x3000437c050,
0x30003076008, 0x0)
ip:ip_rput+0x12c4(0x148?, 0x300070744c0)
unix:putnext+0x1cc(0x30002db4990, 0x300070744c0?)
ce:ce_putnext_sap+0x2f4(0x30003067b10, , 0x30004c54f40, 0x30002d3d668,
0x1, 0x0)
ce:ce_send_up+0x900(0x30003067b10, 0x30004c54f40, 0x0, , , 0x0)
ce:ce_drain_fifo+0x40(0x300044a9a68, 0x0, 0x10423e20, 0x10423e20, 0x2,
0x0)
unix:thread_start+0x4()
-- end of kernel thread's stack --

SolarisCAT(vmcore.3/8U)>

From: Richard B. Gilbert on 28 Mar 2010 20:27

Wolfgang Ley wrote:
> Hi,
>
> Nelly Boy wrote:
>> Hi
>>
>> We have a 480R server running Solaris 8 (108528-20) and over the last
>> few months the server has been crashing.
>>
>
> Apart from that: consider to move to a more recent Solaris release.
> Solaris 8 is really old.
>

The age of the software is probably not the issue here. Solaris 8 is
quite old but many of us here are older still! ;-)

The real issue with running S8 is that you can't get support from Sun.

If the OP upgrades to S10 and has the same problem, he will at least be
able to get support.

From: Chris Ridd on 29 Mar 2010 02:13

On 2010-03-28 18:08:26 +0100, Canuck57 said:

> If he has been running smoothly for 7 years, patching isn't the root
> cause. Nor would I recommend just patching up unless the more precise
> cause was known to be fixed by a patch.
>
> No doubt something changed in SW somewhere. In at least the way it is
> used or whatever.

Given that Oracle was running on the panicing CPU according to the
crash dump output sent on 26 March... has Oracle changed?
--
Chris

| Next | Last
Pages: 1 2 3
Prev: ZFS pool
Next: snv_134 Zpool Failure (again)