From: Andrew Gabriel on 25 Feb 2010 15:53 In article <53cbf384-60f5-4e77-8940-979b1dbf3c17(a)u20g2000yqu.googlegroups.com>, Doug <dy2t7t(a)gmail.com> writes: > Thanks for your suggestions so far. > > I do run zpool scrub periodically, but it was not running when the > system hang. It usually takes around 12 hours to scrub around 12TB of > disk data on a relatively quiescent system. The load average is > between 4-5 when it is scrubbing. > > I was running "prstat -Z" on the system when it hung. It has 5 > zones. The process running the sort was from a non-global zone and > the last thing printed by prstat before the hang was that it was using > about 4GB of RSS. I am pretty sure it was /usr/bin/sort, which is a > 32-bit binary, using that memory. I didn't see any temp files in /var/ > tmp nor any messages that any filesystem filled up. > > When the system did start responding again after 20 minutes, the load > average reported by prstat was >2000. It seems that >2000 processes Actually, it means 2000 runnable _threads_. That could be 2000 single threaded processes, or a single process with 2000 threads, or something inbetween. > would need normally need service if the system hung for 20 minutes. > I'm frustrated that there were no messages left behind as to what > caused the hang, though. -- Andrew Gabriel [email address is not usable -- followup in the newsgroup]
First
|
Prev
|
Pages: 1 2 Prev: Kernel parms like MAXUPROC on AIX Next: Jumpstart Sol10 with Sol8 jumpstart server |