socket flushing/buffering problem, app hangs on close [TCL]

Prev: ActiveState`s documentation question
Next: One more idiot "button -command" question

From: Uwe Klein on 2 Feb 2010 03:49

WC wrote:
> Unfortunately not, if I do this while the application is running and I
> leave the socket non-blocking. TCL will return from close immediately
> and try to flush the data in the background. So the script layer
> "thinks" it's closed but a file descriptor is forever allocated to the
> interpreter. Many opens and closes with the bad server eventually causes
> file descriptor starvation in the process.

n.B.
I once wrote a "self scriptable" ( not tcl ;-) multiplexer in
C for distributing messages ( duplicating, logging) between different
processes. ( Most processes where run of the mill tty/cmdline oriented programms )
Outgoing problems were handled via sigpipe. ( would not have caught an
unresponsive client either )

Limit the problem. don't try to reconnect? A dead client is dead, dead, dead
Would that work for you?
Limit buffering space?

uwe

From: PaulWalton on 2 Feb 2010 04:19

On Feb 1, 7:29 pm, WC <wcu...(a)cox.net> wrote:
> I've written a TCL app that receives data from a single TCP source and
> distributes this data to multiple TCP receivers using a very simple
> ASCII protocol. The server is non-blocking using TCL's event loop. Most
> of the receivers are not under my control and sometimes behave poorly.
> This means I don't have access to code/application and in some cases the
> owner of those applications.
>
> Here is my problem.
>
> TCL has called my writable handler indicating that a channel is ready
> for data. I write data to the channel but the client stops reading data
> at some point, but does not close the connection. TCP's flow control
> kicks in and data ends up being buffered in the receivers TCP input
> buffer, my hosts TCP output buffer and finally my application's TCL
> channel output buffer.
>
> If at this point I connect to another port and issue a command for my
> application to shutdown it hangs. I forced a core dump and noticed that
> it's hanging in send(). The man page for TCL's close indicates that TCL
> will put the channel into blocking mode and attempt to flush the channel
> of any remaining data, the interpreter does this for each open channel
> when exit is called. However if the TCP stack is not accepting data the
> application will never be able to exit or close channels without exiting
> for that matter. This appears to be a pretty serious bug. I need to
> 'kill -9' in order to force an exit... very ugly. Seems like what is
> needed is an option to the close command to discard any data buffered in
> the TCL channel's output buffer and close the channel.
>
> I coded a small extension in C that closes the OS specific handle for
> the channel and the unregisters the channel from the interpreter. This
> causes send() to return -1 but the interpreter doesn't care at that
> point and shutdown continues successfully.
>
> Anyone else run into this? I'm I totally missing something here?
>
> BTW I'm using TCL 8.4 on Linux and HP-UX but a review of the current 8.5
> API it seems like this deadlock could still exist.
>
> Any input/ideas are greatly appreciated,
> Wayne

Why does this work?

Interp 1:
% socket -server accept 1515
sock5
% proc accept {socket clientAddr clientPort} {
puts "Accepted $socket."
puts $socket "hello"
return
}
% after 60000 exit
after#0
% vwait forever
Accepted sock7.
Accepted sock8.
MacBookPro:~ paul$

Interp 2:
% socket localhost 1515
sock5
% close sock5
% socket localhost 1515
sock5
% close sock5
% exit

I ran 'exit' in Interp 2 before the 'after' was triggered in Interp 1.
As you can see tclsh exits fine for me. Or is there a flaw in this
test? I'm on Mac OS 10.4.

From: WC on 2 Feb 2010 09:56

PaulWalton wrote:

>
> Why does this work?
>
> Interp 1:
> % socket -server accept 1515
> sock5
> % proc accept {socket clientAddr clientPort} {
> ...
> ...
> ...
>
>
> I ran 'exit' in Interp 2 before the 'after' was triggered in Interp 1.
> As you can see tclsh exits fine for me. Or is there a flaw in this
> test? I'm on Mac OS 10.4.

Hi Paul,

Well you're close but, it is not a valid test. The TCL IO system's
buffers were able to flush before it exited. You sent a very small
amount of data. Though you didn't do a read in your client app TCL was
able to clear it's IO buffers down the TCP stack, in which the OS close
succeeds.

My application is streaming data to a number of clients and it is not
unusual for it to build up a half meg of data rather quickly. For the
test to be valid the receivers TCP input queue needs to be full as well
as the senders TCP output queue. With moderm TCP stacks this can be
several hundred K of data. Only then will TCL begin to buffer data in
it's interp's IO buffers. That will definitely cause TCL to block when
attempting to clear those buffers.

Thanks,
Wayne

From: WC on 2 Feb 2010 10:50

Uwe Klein wrote:
>
> n.B.
> I once wrote a "self scriptable" ( not tcl ;-) multiplexer in
> C for distributing messages ( duplicating, logging) between different
> processes. ( Most processes where run of the mill tty/cmdline oriented
> programms )
> Outgoing problems were handled via sigpipe. ( would not have caught an
> unresponsive client either )
>
> Limit the problem. don't try to reconnect? A dead client is dead, dead,
> dead
> Would that work for you?
> Limit buffering space?
>
>
> uwe

LOL, that would work for me... But not for my boss:( We get paid for the
data we send them. Yes this is an annoying scenario since the problem is
the customers application. But it is what it is.

I'm attempting to replace a version of this same application that I
wrote in C a few years ago. But it doesn't make a very strong case if I
need to include a C extension with the script in order to terminate
badly behaving clients:( The C application happily frees it's back
queue, sets the TCP linger timer to 0 and closes the socket. It then
reconnects and sends the client data... until the client app stops
responding again.

From: Alexandre Ferrieux on 2 Feb 2010 11:23

On Feb 2, 2:29 am, WC <wcu...(a)cox.net> wrote:
>
> [...]
> will put the channel into blocking mode and attempt to flush the channel
> of any remaining data, the interpreter does this for each open channel
> when exit is called. However if the TCP stack is not accepting data the
> application will never be able to exit or close channels without exiting
> for that matter. This appears to be a pretty serious bug. I need to
> 'kill -9' in order to force an exit... very ugly. Seems like what is
> needed is an option to the close command to discard any data buffered in
> the TCL channel's output buffer and close the channel.
>
> I coded a small extension in C that closes the OS specific handle for
> the channel and the unregisters the channel from the interpreter. This
> causes send() to return -1 but the interpreter doesn't care at that
> point and shutdown continues successfully.
>
> Anyone else run into this? I'm I totally missing something here?
>
> BTW I'm using TCL 8.4 on Linux and HP-UX but a review of the current 8.5
> API it seems like this deadlock could still exist.
>
> Any input/ideas are greatly appreciated,
> Wayne

You are absolutely right: that's a design flaw. If our hands were
free, we'd fix that instantly. The problem is the existing base of Tcl
apps... So we can only extend, not reform. Something like [chan
unflush], or [chan discard].

You can file a TIP for that; however, in the meantime, you can use
the following workaround:

set ff [open "|cat >@ $sok" w]
# do writes on $ff, reads on $sok
# you can still fconfigure $ff -blocking 0

# now assume it's time to close
exec kill -INT [pid $ff]
catch {close $ff}

Ugly, eh ? Yup. Just one percent simpler than an extension ;-)

-Alex

First | Prev | Next | Last
Pages: 1 2 3
Prev: ActiveState`s documentation question
Next: One more idiot "button -command" question