From: Marlon on 25 Sep 2009 03:42 Hi all, I'm investigating the following problem: An application on a server started to fail, and when using truss I found it failing thusly: 22258: fcntl(3, F_SETLKW, 0xFFBFF300) Err#46 ENOLCK 'man fcntl' (and poking around on interweb) suggest I have a full file-locking table. I then discovered that the same application would also fail on another server (both Solaris 9), the commonality being they're both clients of an NFS server (Solaris 10). And indeed I was able to demonstrate that the problem goes away if I redirect the application either to a file on local disk or to a file on an NFS mount from a different NFS server. On the NFS server itself the NFS lockd is running (in fact several are): # ps -ef|grep lockd daemon 377 1 0 Feb 08 ? 190:53 /usr/lib/nfs/lockd daemon 19606 1 0 Feb 20 ? 0:00 /usr/lib/nfs/lockd daemon 14246 1 0 Jun 11 ? 0:00 /usr/lib/nfs/lockd root 15512 11787 0 08:33:31 pts/4 0:00 grep lockd daemon 21127 1 0 Sep 17 ? 0:00 /usr/lib/nfs/lockd So I'm trying to decide what the next step is - I would prefer not to have to restart the NFS service, and am even less keen on rebooting the server as a whole (because there's other stuff on there). The server and the NFS service on it have an uptime of 594 days (but it IS mission-critical!), so I'm prepared to believe we have a genuine resource usage issue here. Would it be useful to restart the lockd - is that indeed possible without any adverse effect? Is there a way of 'cleaning out' the file locking table from the command line? I'e discovered how to increase the size of the lock table, with the /etc/system entry: set tune_t_flckrec=1024 (my system has the default 512). But of course that requires a server reboot. So does anyone have any clever ideas, or point out anything I've missed in this? Your help would be greatly appreciated! Cheers, Marlon
|
Pages: 1 Prev: Weird problem in booting Solaris 8 Next: problem creating RAID1 volume with raidctl |