Prev: metaparam change on LIVE system - Dismount? Reboot?
Next: Cannot boot x4200 via dhcp (from another x4200)
From: Kevin on 21 Feb 2007 12:52 I have Solaris 9 on my production server which runs Apache and some other web servers. During peak load I see around 12,000 connections in time_wait. When I perform a netstat -a | grep -i wait | wc -l, I see around 12,000. However, when I perform a lsof -i | grep -i wait | wc - l, I see only 100. Is this normal ? Also, how can I use lsof to see which process is taking up maximum connections in wait state ? Using lsof -i on the or lsof -p does not show the correct connections in time_wait. I just so wish that Solaris would provide a "netstat -p" option to list processes with netstat, would make my life so much easier ! Kevin.
From: Rick Jones on 21 Feb 2007 15:27 Kevin <kejoseph(a)hotmail.com> wrote: > I have Solaris 9 on my production server which runs Apache and some > other web servers. During peak load I see around 12,000 connections > in time_wait. When I perform a netstat -a | grep -i wait | wc -l, I > see around 12,000. However, when I perform a lsof -i | grep -i wait > | wc - l, I see only 100. Is this normal ? Your commands will include CLOSE_WAIT, FIN_WAIT_1 and FIN_WAIT_2 in addition to TIME_WAIT. You probably should add a '-n' to that netstat command as there isn't much point in resolving IPs and port numbers to names for the counting. > Also, how can I use lsof to see which process is taking up maximum > connections in wait state ? Using lsof -i on the or lsof -p does not > show the correct connections in time_wait. > I just so wish that Solaris would provide a "netstat -p" option to > list processes with netstat, would make my life so much easier ! I'm not sure it would. 99 times out of ten, a TCP connection is in TIME_WAIT after both sides have called close(), which means there is no longer any association with a process. About the only time there would still be a chance of an assocation with a process is if the process used shutdown() and had not yet called close(). That would be the 100th time out of ten. For a complete description of things like the TCP connection states, the works of the late W. Richard Stevens et al and/or the not late Stallings could be useful. rick jones ftp://ftp.cup.hp.com/dist/networking/tools/connhist - might need a little polish to remove bitrot, but it may be of some help -- denial, anger, bargaining, depression, acceptance, rebirth... where do you want to be today? these opinions are mine, all mine; HP might not want them anyway... :) feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
From: Darren Dunham on 21 Feb 2007 14:53 Kevin <kejoseph(a)hotmail.com> wrote: > Also, how can I use lsof to see which process is taking up maximum > connections in wait state ? Using lsof -i on the or lsof -p does not > show the correct connections in time_wait. I'm not sure there is one. TIME_WAIT is a state for the connection after the process has closed it. So I wouldn't expect a process to be associated with such a state. Even if you could discover the process that used to have the connection, it might not be running any longer. > I just so wish that Solaris would provide a "netstat -p" option to > list processes with netstat, would make my life so much easier ! There was some discussion about that on one of the opensolaris forums. I don't recall any specifics. -- Darren Dunham ddunham(a)taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
From: Kevin on 5 Mar 2007 13:12 Rick/Darren, First off thanks for your reply. Based on what you are stating then, I take it you mean that once a process is in time_wait, it does not use up a file handle (which is why, as per your argument, lsof does not show it). However, I am having a tough time buying it because based on what I know even a connection in time_wait or close_wait is a network connection and has to be associated with a process ; and if it is associated with a process it has to take up a file handle. This explains why many times you see a process "run out of file handles" and notice thousands of network connections in time_wait (while there are hardly any regular files in use). Again, in Solaris its not possible to see this as lsof does not show it, but I have definitely seen these in Linux and Win2k. But, if there is a link you can provide me which validates your point I will be more than willing to go over it. PS: Why do I think it needs to be associated with a process ? Lets take a step back and try to understand why a process enters TIME_WAIT and does not get closed immediately. It enters TIME_WAIT so as to "allow time for any remaining packets to arrive before the port gets reused" (taken from gottry.com). Now, assuming the network connection is not associated with a process, then if a packet arrives at that port, it would have no way of knowing which process to associate it with. Thanks again, Kevin.
From: Frank Cusack on 5 Mar 2007 13:33
On 5 Mar 2007 10:12:39 -0800 "Kevin" <kejoseph(a)hotmail.com> wrote: > First off thanks for your reply. Based on what you are stating then, > I take it you mean that once a process is in time_wait, it does not > use up a file handle (which is why, as per your argument, lsof does > not show it). However, I am having a tough time buying it because > based on what I know even a connection in time_wait or close_wait is a > network connection and has to be associated with a process ; and if it what you know is wrong :-) > is associated with a process it has to take up a file handle. This > explains why many times you see a process "run out of file handles" > and notice thousands of network connections in time_wait (while there > are hardly any regular files in use). Again, in Solaris its not > possible to see this as lsof does not show it, but I have definitely > seen these in Linux and Win2k. > > But, if there is a link you can provide me which validates your point > I will be more than willing to go over it. Pick up the Stevens TCP/IP book. > PS: Why do I think it needs to be associated with a process ? Lets > take a step back and try to understand why a process enters TIME_WAIT > and does not get closed immediately. It enters TIME_WAIT so as to > "allow time for any remaining packets to arrive before the port gets > reused" (taken from gottry.com). Now, assuming the network connection > is not associated with a process, then if a packet arrives at that > port, it would have no way of knowing which process to associate it > with. Then the packet is dropped, just exactly the same as if a process is associated with it. In CLOSE_WAIT the connection moves to TIME_WAIT. This still does not require any process-specific handling. -frank |