socket flushing/buffering problem, app hangs on close [TCL]

Prev: ActiveState`s documentation question
Next: One more idiot "button -command" question

From: WC on 1 Feb 2010 20:29

I've written a TCL app that receives data from a single TCP source and
distributes this data to multiple TCP receivers using a very simple
ASCII protocol. The server is non-blocking using TCL's event loop. Most
of the receivers are not under my control and sometimes behave poorly.
This means I don't have access to code/application and in some cases the
owner of those applications.

Here is my problem.

TCL has called my writable handler indicating that a channel is ready
for data. I write data to the channel but the client stops reading data
at some point, but does not close the connection. TCP's flow control
kicks in and data ends up being buffered in the receivers TCP input
buffer, my hosts TCP output buffer and finally my application's TCL
channel output buffer.

If at this point I connect to another port and issue a command for my
application to shutdown it hangs. I forced a core dump and noticed that
it's hanging in send(). The man page for TCL's close indicates that TCL
will put the channel into blocking mode and attempt to flush the channel
of any remaining data, the interpreter does this for each open channel
when exit is called. However if the TCP stack is not accepting data the
application will never be able to exit or close channels without exiting
for that matter. This appears to be a pretty serious bug. I need to
'kill -9' in order to force an exit... very ugly. Seems like what is
needed is an option to the close command to discard any data buffered in
the TCL channel's output buffer and close the channel.

I coded a small extension in C that closes the OS specific handle for
the channel and the unregisters the channel from the interpreter. This
causes send() to return -1 but the interpreter doesn't care at that
point and shutdown continues successfully.

Anyone else run into this? I'm I totally missing something here?

BTW I'm using TCL 8.4 on Linux and HP-UX but a review of the current 8.5
API it seems like this deadlock could still exist.

Any input/ideas are greatly appreciated,
Wayne

From: tom.rmadilo on 1 Feb 2010 21:51

On Feb 1, 5:29 pm, WC <wcu...(a)cox.net> wrote:
> I've written a TCL app that receives data from a single TCP source and
> distributes this data to multiple TCP receivers using a very simple
> ASCII protocol. The server is non-blocking using TCL's event loop. Most
> of the receivers are not under my control and sometimes behave poorly.
> This means I don't have access to code/application and in some cases the
> owner of those applications.
>
> Here is my problem.
>
> TCL has called my writable handler indicating that a channel is ready
> for data. I write data to the channel but the client stops reading data
> at some point, but does not close the connection. TCP's flow control
> kicks in and data ends up being buffered in the receivers TCP input
> buffer, my hosts TCP output buffer and finally my application's TCL
> channel output buffer.
>
> If at this point I connect to another port and issue a command for my
> application to shutdown it hangs. I forced a core dump and noticed that
> it's hanging in send(). The man page for TCL's close indicates that TCL
> will put the channel into blocking mode and attempt to flush the channel
> of any remaining data, the interpreter does this for each open channel
> when exit is called. However if the TCP stack is not accepting data the
> application will never be able to exit or close channels without exiting
> for that matter. This appears to be a pretty serious bug. I need to
> 'kill -9' in order to force an exit... very ugly. Seems like what is
> needed is an option to the close command to discard any data buffered in
> the TCL channel's output buffer and close the channel.
>
> I coded a small extension in C that closes the OS specific handle for
> the channel and the unregisters the channel from the interpreter. This
> causes send() to return -1 but the interpreter doesn't care at that
> point and shutdown continues successfully.
>
> Anyone else run into this? I'm I totally missing something here?
>
> BTW I'm using TCL 8.4 on Linux and HP-UX but a review of the current 8.5
> API it seems like this deadlock could still exist.
>
> Any input/ideas are greatly appreciated,
> Wayne

Right, so it sounds like your wrote an application which gets
stuck...probably due to poor coding. It also sounds like you ran it in
background so you couldn't control it except via signals. The TCP
connection should still time out if you let it sit long enough.

BTW, a channel becomes readable/writable if an error occurs, it is
something of a blunt indicator. In this case is sounds like the
application is simply waiting around to send or receive data. I'm not
sure how this adds up to a bug.

From: WC on 2 Feb 2010 01:56

tom.rmadilo wrote:
> On Feb 1, 5:29 pm, WC <wcu...(a)cox.net> wrote:
>
> Right, so it sounds like your wrote an application which gets
> stuck...probably due to poor coding. It also sounds like you ran it in
> background so you couldn't control it except via signals. The TCP
> connection should still time out if you let it sit long enough.
>
> BTW, a channel becomes readable/writable if an error occurs, it is
> something of a blunt indicator. In this case is sounds like the
> application is simply waiting around to send or receive data. I'm not
> sure how this adds up to a bug.
>

Did you even read my post or were you just looking for someone to criticize?

1) Backgrounding does not imply that an application can only be
controlled via signals. In fact I'm using a control socket on another
port, as stated in my message, to send the app a stop message. But this
is beside the point I'm not sure why you brought it up?

2) You need to go back and study blocking sockets, if the remote end
stops reading data but the IP buffers on both ends are full and you
attempt to write more data, the write end will block until the remote
end begins to read data thus clearing IP buffers or it simply closes the
connection. Neither of which are happening. There is no timeout to wait
for, TCP is operating as designed in this case.

3) I know about read/write handlers, I have both installed on these
channels. The write handler is not getting called because the remote end
is not reading and the read handler is not getting called becuase the
remote end is not closing the socket nor sending my application any
data. I know this because I see that on my host system netstat shows
around 40K in the TCP write Q and the connection is in the ESTABLISHED
state.

Perhaps "bug" is a strong word, it appears that TCL is operating as
designed but there should be a way to close an output channel and
instruct TCL to just discard any data that it has left and not attempt
to send it for the exact reason cited above. It does not sound like a
good design if a remote machine can cause my application to hang while
attempting to close a channel or exit the application simply becuase the
interpreter mandates that it must flush all data from it's queues.

From: David Gravereaux on 2 Feb 2010 02:34

Can't you just close them manually? Off hand:

foreach sock [chan names sock*] {
# enables dump on close
fconfigure $sock -blocking no
close $sock
}

--

From: WC on 2 Feb 2010 02:52

David Gravereaux wrote:
> Can't you just close them manually? Off hand:
>
> foreach sock [chan names sock*] {
> # enables dump on close
> fconfigure $sock -blocking no
> close $sock
> }
>
Unfortunately not, if I do this while the application is running and I
leave the socket non-blocking. TCL will return from close immediately
and try to flush the data in the background. So the script layer
"thinks" it's closed but a file descriptor is forever allocated to the
interpreter. Many opens and closes with the bad server eventually causes
file descriptor starvation in the process.

When the application finally attempts to exit it hangs since it is the
interpreter's policy to flush and close all open channels before it
exists. So all those background tasks prevent it from exiting.

If I put the channel in blocking mode as you suggest above I don't even
get the benefit of the interp attempting to close the channel in the
background. It hangs on the close until the other side reads the data or
terminates the connection. Which means that none of the my other
socket handlers are being serviced as they are in the non-blocking
scenario. Essentially the application gives the impression that it is
locked at this point.

I appreciate the suggestion though!
Thanks.

| Next | Last
Pages: 1 2 3
Prev: ActiveState`s documentation question
Next: One more idiot "button -command" question