NFS writes became extremely slow overnight [Linux Networking]

Prev: Wireless and LAN
Next: TCP data not ack'ed

From: J G Miller on 12 Jul 2010 11:32

On Monday, July 12th, 2010 at 10:09:04h -0500, Ignoramus15939 wrote:

> Any ideas what might cause this?

Only a couple of suggestions to consider:

1) What is the disk usage for the NFS mounted disk on Server A.
If it is getting full, then that could cause a slow down for writing.

2) Is there still a process writing a big file to Server A
in progress?

3) Have you checked that all of the necessary daemons are running properly?
You do not indicate if this is NFSv3 (needs statd) or NFSv4
(needs idmapd), so associated required daemons will be different.
If Kerberos is involved additional daemons are required.

As to a quick fix, restart the NFS daemons on Server A (in worst case
reboot the machine) when all of the users have gone home and nobody
is reading/writing to Server A.

From: Ignoramus15939 on 12 Jul 2010 11:44

On 2010-07-12, J G Miller <miller(a)yoyo.ORG> wrote:
> On Monday, July 12th, 2010 at 10:09:04h -0500, Ignoramus15939 wrote:
>
>> Any ideas what might cause this?
>
> Only a couple of suggestions to consider:
>
> 1) What is the disk usage for the NFS mounted disk on Server A.
> If it is getting full, then that could cause a slow down for writing.

50%

> 2) Is there still a process writing a big file to Server A
> in progress?

I think not, not that I could find, but I will look. Here's the output
of 'top', which looks weirds, considering that load average is 4 and
nothing is really running:

top - 10:42:41 up 484 days, 22:50, 1 user, load average: 3.78, 3.27, 3.12
Tasks: 91 total, 1 running, 90 sleeping, 0 stopped, 0 zombie
Cpu(s):100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2059764k total, 2043948k used, 15816k free, 152092k buffers
Swap: 409616k total, 48k used, 409568k free, 1611376k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 15 0 6120 688 564 S 0 0.0 0:15.07 init
2 root RT 0 0 0 0 S 0 0.0 0:01.21 migration/0
3 root 34 19 0 0 0 S 0 0.0 0:00.01 ksoftirqd/0
4 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0
5 root RT 0 0 0 0 S 0 0.0 0:00.16 migration/1
6 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1
7 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1
8 root 10 -5 0 0 0 S 0 0.0 0:00.07 events/0
9 root 10 -5 0 0 0 S 0 0.0 0:00.06 events/1
10 root 10 -5 0 0 0 S 0 0.0 0:00.00 khelper
11 root 10 -5 0 0 0 S 0 0.0 0:00.00 kthread
16 root 10 -5 0 0 0 S 0 0.0 1:45.50 kblockd/0
17 root 10 -5 0 0 0 S 0 0.0 0:02.04 kblockd/1
18 root 15 -5 0 0 0 S 0 0.0 0:00.00 kacpid
150 root 10 -5 0 0 0 S 0 0.0 0:00.00 khubd
152 root 10 -5 0 0 0 S 0 0.0 0:00.00 kseriod
203 root 15 0 0 0 0 S 0 0.0 0:10.27 pdflush
204 root 15 0 0 0 0 S 0 0.0 1:17.88 pdflush
205 root 10 -5 0 0 0 S 0 0.0 96:37.46 kswapd0
206 root 15 -5 0 0 0 S 0 0.0 0:00.00 aio/0
207 root 15 -5 0 0 0 S 0 0.0 0:00.00 aio/1
1076 root 10 -5 0 0 0 S 0 0.0 1:16.26 kjournald
1266 root 11 -4 10596 708 328 S 0 0.0 0:00.08 udevd
1659 root 16 -5 0 0 0 S 0 0.0 0:00.00 kpsmoused
1879 root 13 -5 0 0 0 S 0 0.0 0:00.00 kmirrord
1962 daemon 15 0 4820 496 376 S 0 0.0 0:01.41 portmap
2198 root 15 0 3728 668 508 S 0 0.0 22:52.14 syslogd
2204 root 15 0 2660 396 308 S 0 0.0 0:00.00 klogd
2279 root 18 0 2652 584 476 S 0 0.0 0:00.00 acpid
2322 Debian-e 15 0 23324 1116 736 S 0 0.1 0:00.24 exim4

linux-nfs(a)vger.kernel.org
> 3) Have you checked that all of the necessary daemons are running properly?
> You do not indicate if this is NFSv3 (needs statd) or NFSv4
> (needs idmapd), so associated required daemons will be different.
> If Kerberos is involved additional daemons are required.

All fstab lines say 'nfs', it may be nfsv4.

> As to a quick fix, restart the NFS daemons on Server A (in worst case
> reboot the machine) when all of the users have gone home and nobody
> is reading/writing to Server A.

Yes, that's the plan for later today, restart nfs daemon on A and see
if it helps.

Thanks JG.

i

From: Hadron on 12 Jul 2010 11:50

J G Miller <miller(a)yoyo.ORG> writes:

> On Monday, July 12th, 2010 at 10:09:04h -0500, Ignoramus15939 wrote:
>
>> Any ideas what might cause this?
>
> Only a couple of suggestions to consider:
>
> 1) What is the disk usage for the NFS mounted disk on Server A.
> If it is getting full, then that could cause a slow down for writing.
>
> 2) Is there still a process writing a big file to Server A
> in progress?
>
> 3) Have you checked that all of the necessary daemons are running properly?
> You do not indicate if this is NFSv3 (needs statd) or NFSv4
> (needs idmapd), so associated required daemons will be different.
> If Kerberos is involved additional daemons are required.
>
> As to a quick fix, restart the NFS daemons on Server A (in worst case
> reboot the machine) when all of the users have gone home and nobody
> is reading/writing to Server A.

Compare the /etc/hosts file on both and, in addition, check they are using the same
DNS.

From: Stan Bischof on 12 Jul 2010 12:00

In comp.os.linux.misc Ignoramus15939 <ignoramus15939(a)nospam.15939.invalid> wrote:
> I think not, not that I could find, but I will look. Here's the output
> of 'top', which looks weirds, considering that load average is 4 and
> nothing is really running:
>
> top - 10:42:41 up 484 days, 22:50, 1 user, load average: 3.78, 3.27, 3.12

That is a little weird. I've seen nfs go wild and start spawning
daemons til the system chokes, and I've seen processes hung in IO
that suck down CPU time.

You might look to see how many NFS daemons you have running.

In any case sounds like time to restart NFS- and any other process(es)
that could be hung.

Stan

From: Ignoramus20495 on 12 Jul 2010 13:47

On 2010-07-12, Stan Bischof <stan(a)newserve.worldbadminton.com> wrote:
> In comp.os.linux.misc Ignoramus15939 <ignoramus15939(a)nospam.15939.invalid> wrote:
>> I think not, not that I could find, but I will look. Here's the output
>> of 'top', which looks weirds, considering that load average is 4 and
>> nothing is really running:
>>
>> top - 10:42:41 up 484 days, 22:50, 1 user, load average: 3.78, 3.27, 3.12
>
> That is a little weird. I've seen nfs go wild and start spawning
> daemons til the system chokes, and I've seen processes hung in IO
> that suck down CPU time.
>
> You might look to see how many NFS daemons you have running.
>
> In any case sounds like time to restart NFS- and any other process(es)
> that could be hung.

Server A:~# ps auxw | grep nfsd
root 2998 0.0 0.0 0 0 ? S< 2009 0:00 [nfsd4]
root 2999 0.1 0.0 0 0 ? S 2009 1317:25 [nfsd]
root 3000 0.1 0.0 0 0 ? S 2009 1311:14 [nfsd]
root 3001 0.1 0.0 0 0 ? S 2009 1299:59 [nfsd]
root 3002 0.1 0.0 0 0 ? S 2009 1306:12 [nfsd]
root 3003 0.1 0.0 0 0 ? S 2009 1305:07 [nfsd]
root 3004 0.1 0.0 0 0 ? S 2009 1302:03 [nfsd]
root 3005 0.1 0.0 0 0 ? D 2009 1287:22 [nfsd]
root 3006 0.1 0.0 0 0 ? S 2009 1296:57 [nfsd]
root 25666 0.0 0.0 3936 716 pts/0 S+ 12:24 0:00 grep nfsd

| Next | Last
Pages: 1 2 3 4 5
Prev: Wireless and LAN
Next: TCP data not ack'ed