From: Egrama on 28 Jan 2010 10:27 On Jan 22, 4:47 pm, "Richard B. Gilbert" <rgilber...(a)comcast.net> wrote: > Egramawrote: > > Hi guys, > > > On a T5220 and a T5240 I noticed this strange behavior: sometimes the > > machine is freezing for a couple of seconds and then continues > > working as if nothing happened. I noticed this because we are running > > some realtime application and 2 seconds delays in processing > > triggers alarms. > > The machine CPU load is around 30% and also the memory. > > Remember that the "CPU load" is an average over time. There is nothing > in that "30%" that precludes the CPU from being 100% busy for a few seconds. > > ISTR something about "real time" priorities. I never had cause to use > them but: seehttp://www.princeton.edu/~unix/Solaris/troubleshoot/schedule.html > > or > > Google! You are right abouut the CPU being an average, but I have seen many overloaded Sun servers with cpu average close to 100% and load average a few times the number of processors. The ssh connection was a bit sluggish, but nothing like freezing. Reading the posts below, I think that it might be network related. I cannot reproduce the incident, so I cannot verify by using the serial console instead of an ssh session.
From: Martha Starkey on 29 Jan 2010 08:54 On 01/22/10 13:20, Drazen Kacar wrote: > Egrama wrote: >> Hi guys, >> >> On a T5220 and a T5240 I noticed this strange behaviour: sometimes the >> machine is freezing for a couple of secconds and then continues >> working as if nothing happened. I noticed this because we are running >> some realtime application and 2 secconds delays in processing >> triggers alarms. >> The machine CPU load is around 30% and also the memory. >> I would say this is not a system related problem, but I noticed the >> problem first hand when my terminal just hung and then the application >> alerts came. >> Has anybody experienced anything similar? I have no errors whatsoever >> in the system logs..... > > I've seen something similar, but for a somewhat longer time period. > Another box announced the same IP address, so switch sent all network > packets there. It looked like the machine was hung, although it was > working perfectly fine. That sounds like the "broadcom arp poisoning" issue with certain NIC drivers acting in "Teamed mode". If you have Broadcom NICS on windows teamed mode servers, you may need updated drivers: http://blogs.sun.com/swas/entry/solaris_10_8_07_broadcom http://blogs.sun.com/swas/entry/update_to_the_broadcom_pc Here's Dell's updated driver page: http://support.dell.com/support/topics/global.aspx/support/dsn/en/document?c=us&dl=false&l=en&s=gen&docid=49F4FB5AA612CFF6E040A68F5A28020D&doclang=en&cs > > You can confirm this by being logged in on the console via lights out > management. That will still work while network connections appear > unresponsive. Although your 2 seconds seem to short to catch this > behaviour reliably. > > Is all your monitoring network based? > > OTOH, perhaps your real time application spawned more threads than there > are CPUs, so some of them have to wait until another real-time thread goes > to sleep. The OS will not preempt RT class. How many CPUs do you have and > how many threads the application has? >
From: ChrisS on 6 Feb 2010 11:02 On Jan 22, 5:39 am, Egrama <egr...(a)gmail.com> wrote: > Hi guys, > > On a T5220 and a T5240 I noticed this strange behaviour: sometimes the > machine is freezing for a couple of secconds and then continues > working as if nothing happened. I noticed this because we are running > some realtime application and 2 secconds delays in processing > triggers alarms. > The machine CPU load is around 30% and also the memory. > I would say this is not a system related problem, but I noticed the > problem first hand when my terminal just hung and then the application > alerts came. > Has anybody experienced anything similar? I have no errors whatsoever > in the system logs..... > Any idea how to investigate this without a major performance impact ? > > Thanks, > Emil Assuming they are both Enterprise servers and not the Netra T5220, the latest firmware is 139439-08 (7.2.7.b). I've loaded this onto two of my new SE T5220s without issues. Is diagnostics turned on Max in ILOM? If, can you poweroff the host and turn it back on? Watch the console carefully (start SP/console , via ILOM). I'm not sure if diags get logged somewhere. Someone suggested it might be NIC problems. Connect to the host's console using the Service Processor (either NetMgr NIC or serial/ tip). If you still see "pausing" going on it "may not" be the NIC(s) of the host (assuming you were remoted into the terminal before). Is the application using NICs to process thru? Is there anything in / var/adm/messages that would suggest a problem? If so, are you dumping data via NFS or some other means? Just food for thought. Good luck.
First
|
Prev
|
Pages: 1 2 Prev: SUDO question - how to run command / script without password Next: Audio driver |