From: D.M. Procida on 24 Apr 2010 13:59 I have a server that periodically stops responding for several minutes; a few minutes later it come back. It's one of three identical virtual servers on the same machine (identical in the sense that they were cloned from each other). When I say it stops responding - I mean to ssh, web requests and other remote connections. Existing connections are lost. The other two virtual servers remain fine. It seems to me it might be really hanging, or alternatively just not receiving requests because of something to do with virtualisation (perhaps the virtual network). How can I best work out what it's doing? Daniele
From: unruh on 24 Apr 2010 15:04 On 2010-04-24, D.M. Procida <real-not-anti-spam-address(a)apple-juice.co.uk> wrote: > I have a server that periodically stops responding for several minutes; > a few minutes later it come back. > > It's one of three identical virtual servers on the same machine > (identical in the sense that they were cloned from each other). > > When I say it stops responding - I mean to ssh, web requests and other > remote connections. Existing connections are lost. > > The other two virtual servers remain fine. > > It seems to me it might be really hanging, or alternatively just not > receiving requests because of something to do with virtualisation > (perhaps the virtual network). > > How can I best work out what it's doing? Examine the logs-- start with /var/log/messages. Are the existing connections active during that time-- ie if you have sshed in, can you continue typing into that terminal and it returns the characters, and runs the commands? If so it is not a network problem. Why in the world do you have three identical virtual servers on that same machine. > > Daniele
From: D.M. Procida on 24 Apr 2010 15:18 unruh <unruh(a)wormhole.physics.ubc.ca> wrote: > On 2010-04-24, D.M. Procida <real-not-anti-spam-address(a)apple-juice.co.uk> wrote: > > I have a server that periodically stops responding for several minutes; > > a few minutes later it come back. > > > > It's one of three identical virtual servers on the same machine > > (identical in the sense that they were cloned from each other). > > > > When I say it stops responding - I mean to ssh, web requests and other > > remote connections. Existing connections are lost. > > > > The other two virtual servers remain fine. > > > > It seems to me it might be really hanging, or alternatively just not > > receiving requests because of something to do with virtualisation > > (perhaps the virtual network). > > > > How can I best work out what it's doing? > > Examine the logs-- start with /var/log/messages. There's nothing in messages - the most recent message is from several hours ago (rsyslogd was HUPed). > Are the existing connections active during that time-- ie if you have > sshed in, can you continue typing into that terminal and it returns the > characters, and runs the commands? If so it is not a network problem. As I said, existing connections are lost. Though having said that, on this occasion it has just come back after several minutes and the ssh terminal session I had open has come back too, whereas usually the sessions are lost completely. > Why in the world do you have three identical virtual servers on that > same machine. One is the real live web server, one is the development server, and one is regularly re-cloned from the live server so that any new stuff that seems ready for deployment can be tested against it. Daniele
From: Chris Davies on 24 Apr 2010 17:59 D.M. Procida <real-not-anti-spam-address(a)apple-juice.co.uk> wrote: > I have a server that periodically stops responding for several minutes; > a few minutes later it come back. > It's one of three identical virtual servers on the same machine > (identical in the sense that they were cloned from each other). What virtualisation technology? VMs on our ESX server (VMware) hang for a good minute while they're being backed up - long enough to lose ssh connections. Our VMware Sysadmin got round this by moving the backup slot to 5am (12 hours earlier than previously). > It seems to me it might be really hanging, or alternatively just not > receiving requests because of something to do with virtualisation > (perhaps the virtual network). How often is this "periodically"? (Daily, hourly, more often?) If it's really quite frequent, check that the VMs' ethernet MAC addresses are unique. Particularly as you say they were cloned from each other. Chris
From: D.M. Procida on 24 Apr 2010 18:33
Chris Davies <chris-usenet(a)roaima.co.uk> wrote: > D.M. Procida <real-not-anti-spam-address(a)apple-juice.co.uk> wrote: > > I have a server that periodically stops responding for several minutes; > > a few minutes later it come back. > > > It's one of three identical virtual servers on the same machine > > (identical in the sense that they were cloned from each other). > > What virtualisation technology? VMs on our ESX server (VMware) hang > for a good minute while they're being backed up - long enough to lose > ssh connections. Our VMware Sysadmin got round this by moving the backup > slot to 5am (12 hours earlier than previously). It is indeed ESX. But, it's not backups that are causing it, unless someone changed something yesterday without telling me. > > It seems to me it might be really hanging, or alternatively just not > > receiving requests because of something to do with virtualisation > > (perhaps the virtual network). > > How often is this "periodically"? (Daily, hourly, more often?) If > it's really quite frequent, check that the VMs' ethernet MAC addresses > are unique. Particularly as you say they were cloned from each other. It seems to happen every 30 minutes or so. It first happened yesterday, out of the blue; everything has been working happily for months. I have a little Python script running now: for x in range(1000000): print " "[:int(str(x)[-1])], datetime.now() time.sleep(1) and I can see that it doesn't miss a beat, even when things stop working (when they start again, it catches up). SO that shows that the machine itself is still ticking away, the problem is connecting to it. Daniele |