How to troubleshoot 30 min Solaris10u8 hang? [Solaris]

Prev: Kernel parms like MAXUPROC on AIX
Next: Jumpstart Sol10 with Sol8 jumpstart server

From: Andrew Gabriel on 25 Feb 2010 15:53

In article <53cbf384-60f5-4e77-8940-979b1dbf3c17(a)u20g2000yqu.googlegroups.com>,
Doug <dy2t7t(a)gmail.com> writes:
> Thanks for your suggestions so far.
>
> I do run zpool scrub periodically, but it was not running when the
> system hang. It usually takes around 12 hours to scrub around 12TB of
> disk data on a relatively quiescent system. The load average is
> between 4-5 when it is scrubbing.
>
> I was running "prstat -Z" on the system when it hung. It has 5
> zones. The process running the sort was from a non-global zone and
> the last thing printed by prstat before the hang was that it was using
> about 4GB of RSS. I am pretty sure it was /usr/bin/sort, which is a
> 32-bit binary, using that memory. I didn't see any temp files in /var/
> tmp nor any messages that any filesystem filled up.
>
> When the system did start responding again after 20 minutes, the load
> average reported by prstat was >2000. It seems that >2000 processes

Actually, it means 2000 runnable _threads_. That could be 2000 single
threaded processes, or a single process with 2000 threads, or
something inbetween.

> would need normally need service if the system hung for 20 minutes.
> I'm frustrated that there were no messages left behind as to what
> caused the hang, though.

--
Andrew Gabriel
[email address is not usable -- followup in the newsgroup]

First | Prev |
Pages: 1 2
Prev: Kernel parms like MAXUPROC on AIX
Next: Jumpstart Sol10 with Sol8 jumpstart server