From: gb345 on 13 Mar 2010 20:39 Our lab's computer cluster consists of about 35 nodes (running Ubuntu), all attached to a single large fileserver. Every couple of months, an access overload on the fileserver incapacitates the whole system, and a reboot is required. To make matters worse, we don't have a way to determine the user(s) whose process or processes were responsible for hammering the fileserver. So we can't even tell if we're dealing with malicious attacks or not. What tools/utilities exist for keeping an up-to-the-minute log of each user's IO load on the fileserver? Thanks in advance, GB
From: Greg Russell on 14 Mar 2010 01:21 "gb345" <gb345(a)invalid.com> wrote in message news:hnheo3$hoh$1(a)reader1.panix.com... .... > To make matters worse, we don't have a way to determine the user(s) > whose process or processes were responsible for hammering the > fileserver. So we can't even tell if we're dealing with malicious > attacks or not. > > What tools/utilities exist for keeping an up-to-the-minute log of > each user's IO load on the fileserver? "Up-to-the-minute" might require an every-minute cron invocation of "top -H -n1 | head > /tmp/cron.top" or some such thing.
From: J G Miller on 14 Mar 2010 09:38 On Sun, 14 Mar 2010 01:39:15 +0000, gb345 wrote: > To make matters worse, we don't have a way to determine the user(s) > whose process or processes were responsible for hammering the > fileserver. It is impossible to answer your question because you provide no information on what services this "file" server is running, or how the files (and what are typical sizes for these files) are being served viz SAMBA, NFS etc. The most obvious first step would be to check the syslog / messages / daemon.log in /var/log/syslog for events just prior to the crash. You could also try turning on more debug / verbose options on your network services to see what is happening.
From: Balwinder S Dheeman on 14 Mar 2010 15:33 On 03/14/2010 07:08 PM, J G Miller wrote: > On Sun, 14 Mar 2010 01:39:15 +0000, gb345 wrote: > >> To make matters worse, we don't have a way to determine the user(s) >> whose process or processes were responsible for hammering the >> fileserver. > > It is impossible to answer your question because you provide no > information on what services this "file" server is running, or > how the files (and what are typical sizes for these files) are > being served viz SAMBA, NFS etc. > > The most obvious first step would be to check the syslog / messages > / daemon.log in /var/log/syslog for events just prior to the crash. > > You could also try turning on more debug / verbose options on > your network services to see what is happening. I think, running a remote log server and capturing all log from that file server would be more helpful, see http://www.linuxsecurity.com/content/view/117513/171/ for how to do it. -- Balwinder S "bdheeman" Dheeman Registered Linux User: #229709 Anu'z Linux(a)HOME (Unix Shoppe) Machines: #168573, 170593, 259192 Chandigarh, UT, 160062, India Plan9, T2, Arch/Debian/FreeBSD/XP Home: http://werc.homelinux.net/ Visit: http://counter.li.org/
From: gb345 on 15 Mar 2010 11:55 In <1268573896_41(a)vo.lu> J G Miller <miller(a)yoyo.ORG> writes: >On Sun, 14 Mar 2010 01:39:15 +0000, gb345 wrote: >> To make matters worse, we don't have a way to determine the user(s) >> whose process or processes were responsible for hammering the >> fileserver. >It is impossible to answer your question because you provide no >information on what services this "file" server is running, or >how the files (and what are typical sizes for these files) are >being served viz SAMBA, NFS etc. Sorry for the omission. The files are being served via NFS, but I don't know what the typical size for this file is. I suppose I could do a global analysis of all the files in the server to determine their average size, although this may not be a very accurate estimate of the average size of a *served* file. In fact, knowing how to measure this "average served-file size" may give me some ideas on how to monitor the IO load on the server. >The most obvious first step would be to check the syslog / messages >/ daemon.log in /var/log/syslog for events just prior to the crash. >You could also try turning on more debug / verbose options on >your network services to see what is happening. Thanks, GB
|
Next
|
Last
Pages: 1 2 Prev: wireless dhcp x2 Next: how to determine recording date from quicktime video? |