From: Marlon on

Hi all,
I'm investigating the following problem:

An application on a server started to fail, and when using truss I found
it failing thusly:

22258: fcntl(3, F_SETLKW, 0xFFBFF300) Err#46 ENOLCK

'man fcntl' (and poking around on interweb) suggest I have a full
file-locking table.
I then discovered that the same application would also fail on another
server (both Solaris 9), the commonality being they're both clients of
an NFS server (Solaris 10).
And indeed I was able to demonstrate that the problem goes away if I
redirect the application either to a file on local disk or to a file on
an NFS mount from a different NFS server.

On the NFS server itself the NFS lockd is running (in fact several are):

# ps -ef|grep lockd
daemon 377 1 0 Feb 08 ? 190:53 /usr/lib/nfs/lockd
daemon 19606 1 0 Feb 20 ? 0:00 /usr/lib/nfs/lockd
daemon 14246 1 0 Jun 11 ? 0:00 /usr/lib/nfs/lockd
root 15512 11787 0 08:33:31 pts/4 0:00 grep lockd
daemon 21127 1 0 Sep 17 ? 0:00 /usr/lib/nfs/lockd

So I'm trying to decide what the next step is - I would prefer not to
have to restart the NFS service, and am even less keen on rebooting the
server as a whole (because there's other stuff on there).

The server and the NFS service on it have an uptime of 594 days (but it
IS mission-critical!), so I'm prepared to believe we have a genuine
resource usage issue here.

Would it be useful to restart the lockd - is that indeed possible
without any adverse effect?

Is there a way of 'cleaning out' the file locking table from the command
line?
I'e discovered how to increase the size of the lock table, with the
/etc/system entry:

set tune_t_flckrec=1024

(my system has the default 512). But of course that requires a server
reboot.

So does anyone have any clever ideas, or point out anything I've missed
in this?

Your help would be greatly appreciated!

Cheers,
Marlon