From: Egrama on
On Jan 22, 4:47 pm, "Richard B. Gilbert" <rgilber...(a)comcast.net>
wrote:
> Egramawrote:
> > Hi guys,
>
> > On a T5220 and a T5240 I noticed this strange behavior: sometimes the
> > machine is freezing for a couple of seconds and then continues
> > working as if nothing happened. I noticed this because we are running
> > some realtime application and 2 seconds delays in processing
> > triggers alarms.
> > The machine CPU load is around 30% and also the memory.
>
> Remember that the "CPU load" is an average over time.  There is nothing
> in that "30%" that precludes the CPU from being 100% busy for a few seconds.
>
> ISTR something about "real time" priorities.  I never had cause to use
> them but:  seehttp://www.princeton.edu/~unix/Solaris/troubleshoot/schedule.html
>
> or
>
> Google!

You are right abouut the CPU being an average, but I have seen many
overloaded Sun servers with cpu average close to 100% and load average
a few times the number of processors.
The ssh connection was a bit sluggish, but nothing like freezing.
Reading the posts below, I think that it might be network related. I
cannot reproduce the incident, so I cannot verify by using the serial
console instead of an ssh session.
From: Martha Starkey on
On 01/22/10 13:20, Drazen Kacar wrote:
> Egrama wrote:
>> Hi guys,
>>
>> On a T5220 and a T5240 I noticed this strange behaviour: sometimes the
>> machine is freezing for a couple of secconds and then continues
>> working as if nothing happened. I noticed this because we are running
>> some realtime application and 2 secconds delays in processing
>> triggers alarms.
>> The machine CPU load is around 30% and also the memory.
>> I would say this is not a system related problem, but I noticed the
>> problem first hand when my terminal just hung and then the application
>> alerts came.
>> Has anybody experienced anything similar? I have no errors whatsoever
>> in the system logs.....
>
> I've seen something similar, but for a somewhat longer time period.
> Another box announced the same IP address, so switch sent all network
> packets there. It looked like the machine was hung, although it was
> working perfectly fine.

That sounds like the "broadcom arp poisoning" issue with certain NIC
drivers acting in "Teamed mode". If you have Broadcom NICS on windows
teamed mode servers, you may need updated drivers:

http://blogs.sun.com/swas/entry/solaris_10_8_07_broadcom

http://blogs.sun.com/swas/entry/update_to_the_broadcom_pc

Here's Dell's updated driver page:

http://support.dell.com/support/topics/global.aspx/support/dsn/en/document?c=us&dl=false&l=en&s=gen&docid=49F4FB5AA612CFF6E040A68F5A28020D&doclang=en&cs


>
> You can confirm this by being logged in on the console via lights out
> management. That will still work while network connections appear
> unresponsive. Although your 2 seconds seem to short to catch this
> behaviour reliably.
>
> Is all your monitoring network based?
>
> OTOH, perhaps your real time application spawned more threads than there
> are CPUs, so some of them have to wait until another real-time thread goes
> to sleep. The OS will not preempt RT class. How many CPUs do you have and
> how many threads the application has?
>
From: ChrisS on
On Jan 22, 5:39 am, Egrama <egr...(a)gmail.com> wrote:
> Hi guys,
>
> On a T5220 and a T5240 I noticed this strange behaviour: sometimes the
> machine is freezing for a couple of secconds and then continues
> working as if nothing happened. I noticed this because we are running
> some realtime application and 2 secconds delays in processing
> triggers alarms.
> The machine CPU load is around 30% and also the memory.
> I would say this is not a system related problem, but I noticed the
> problem first hand when my terminal just hung and then the application
> alerts came.
> Has anybody experienced anything similar? I have no errors whatsoever
> in the system logs.....
> Any idea how to investigate this without a major performance impact ?
>
> Thanks,
> Emil

Assuming they are both Enterprise servers and not the Netra T5220, the
latest firmware is 139439-08 (7.2.7.b). I've loaded this onto two of
my new SE T5220s without issues. Is diagnostics turned on Max in
ILOM? If, can you poweroff the host and turn it back on? Watch the
console carefully (start SP/console , via ILOM). I'm not sure if
diags get logged somewhere.

Someone suggested it might be NIC problems. Connect to the host's
console using the Service Processor (either NetMgr NIC or serial/
tip). If you still see "pausing" going on it "may not" be the NIC(s)
of the host (assuming you were remoted into the terminal before). Is
the application using NICs to process thru? Is there anything in /
var/adm/messages that would suggest a problem? If so, are you dumping
data via NFS or some other means?

Just food for thought. Good luck.