From: tom.rmadilo on 24 Feb 2010 04:42 On Feb 23, 11:59 pm, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com> wrote: > On Feb 23, 11:16 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > > > > > > Re-read the manpage. [chan pending input] gives the number of bytes > > > _already_ buffered. When a fileevent fires, it can be of two kinds: fd- > > > level when there's no buffered byte and select was called upon, and > > > buffer-level when there are buffered bytes, and no select came into > > > play. > > > Given my example, I would say that this analysis is incomplete. If a > > fileevent is triggered, and [chan pending] then returns "0", and no > > data is read (and the callback returns), the buffer never fills up. > > You get into an infinite loop. > > ??? What the heck are you talking about ??? > If you've found a case where fileevent repeatedly fires a false alarm > (and not only sporadic ones as it has been reported with TLS), please > file a bug. For instance I have logs of several request/response (chunked and not chunked) at: http://www.junom.com/gitweb/gitweb.perl?p=htclient.git;a=tree;f=example-logs search for "pending client1 = 0". The chunked transfers are the most enlightening. The buffer is slowly drained, goes to zero, then [chan pending] returns zero after the next fileevent. I read one char, then the buffer fills up again. Here is a short snip: Debug: htChunkSize char client1 = '1' class = HEXCHAR size = 0 Debug: htChunkSize char client1 = '4' class = HEXCHAR size = 1 Debug: htChunkSize char client1 = '8' class = HEXCHAR size = 20 Debug: htChunkSize char client1 = 'a' class = HEXCHAR size = 328 Debug: htChunkSize char client1 = ' ' class = CR size = 5258 Debug: htChunkSize char client1 = ' ' class = LF size = 5258 Debug: Chunk size client1 = 5258 Debug: htChunkData pending client1 = 3136 start = 5258, remain = 2122 Debug: htChunkData pending client1 = 0 start = 2122, remain = 2121 <= read one char Debug: htChunkData pending client1 = 1447 start = 2121, remain = 674 Debug: htChunkData pending client1 = 0 start = 674, remain = 673 <= read one char Debug: htChunkData pending client1 = 1447 start = 673, remain = 0 > Otherwise, stick to the idioms provided and use them. Not sure what idioms you think I'm not sticking to. I'm not sure if these are false alarms or not. If I don't read one char, I get an infinite loop and [chan pending] stays at zero. I also improved a little bit on the regexp. It is parsing correct headers into tokens, but incorrect ones get parsed as well: "abcdefg\"hijk" => {"abcdefg\"hijk"} <= valid start token "abcdefg\\"hijk" => {"abcdefg\\"} hijk <= invalid start token
From: Donal K. Fellows on 24 Feb 2010 04:43 On 23 Feb, 17:47, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > Right, my htclient code tokenizes the header field values as well as > headers. If you can't tokenize the field value, you can't really > distinguish between valid and invalid headers. Maybe a regexp exists > which can do this, right now I use an fsm to do the job. It works and > is guaranteed not to block. I *think* that the natural way to process HTTP headers is as a multi- level grammar. First you read lines until you get a proper blank line (i.e., the end of the headers). Then you go back and split the block of headers into individual header "lines" (which may be multiple lines; I believe continuation lines must start with a space or tab). Then, if desired, you parse the individual header lines (at the very least, you need to work out what the name of the header line is, but that should be trivial). > This is exactly what I do to avoid blocking: I read one char at a time > for headers or [chan pending] chars for the body. That's quite messy. With a non-blocking channel (if you're not non- blocking, it's time to change!) you can just do a [gets $ch lineVar] when the channel is readable. That will either read a full line (up to whatever is set as the line separator) and return the length of line (minus terminator) or return -1. If it returns -1, you've either got an EOF or you've exhausted the bytes available without being able to get a line (i.e., [fblocked $ch] will return 1 at that point). If you've blocked, you can just go back to sleep waiting in the event loop for some more bytes to arrive. Or if you're being careful, you can use [chan pending input $ch] to see whether the data that's accumulated in the input buffers - which must be a single incomplete line because you didn't read a complete one - has exceeded some threshold and you're going to kill the connection for being from a bunch of scumbags. You probably want to limit the number of complete lines you read too, for identical reasons. In short, having *some* buffering and memory allocation is OK. So long as it doesn't get out of hand, it's easier to let Tcl do all those bits for you. And trying to do all the parsing in a single level of FSM[*] is painful. Thanks to Alexandre for reminding me how to use [chan pending]. :-) > If we had [chan unputs], I could eliminate about half my fsm code > (needed to handle <CR><LF>). You can do it by stacking a transformation on the channel that delivers the unput-ted characters instead of the ones from the underlying channel. But that requires 8.6 or an extension (whose name I forget right now); the transformation API didn't get done in time for 8.5. Donal. [* I keep reading that as "Flying Spaghetti Monster". Seems apt to me. :-) ]
From: Alexandre Ferrieux on 24 Feb 2010 10:06 On Feb 24, 10:42 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > > > If you've found a case where fileevent repeatedly fires a false alarm > > (and not only sporadic ones as it has been reported with TLS), please > > file a bug. > > For instance I have logs of several request/response (chunked and not > chunked) at: > > http://www.junom.com/gitweb/gitweb.perl?p=htclient.git;a=tree;f=examp... Repeat: _file a bug_ at SF. This implies that you'll first have to make an effort to summarize, circumscribe, describe, characterize. A brain dump sprinkled with logs won't do. > > Otherwise, stick to the idioms provided and use them. > > Not sure what idioms you think I'm not sticking to. The proc skeleton written by Eric that I've just copied. > I'm not sure if these are false alarms or not. If I don't read one > char, I get an infinite loop and [chan pending] stays at zero. Re-read what I've written on the two kinds of fileevents (fd-based or buffer-based). Of course if your fileevent handlers doesn't read anything, it will re- fire indefinitely. And of course [chan pending] says 0 at that time since no byte has been read yet. What did you expect ? > I also improved a little bit on the regexp. It is parsing correct > headers into tokens, but incorrect ones get parsed as well: > > "abcdefg\"hijk" => {"abcdefg\"hijk"} <= valid start token > "abcdefg\\"hijk" => {"abcdefg\\"} hijk <= invalid start token Then please provide, in a self-contained manner, the regexp that is failing. -Alex
From: Alexandre Ferrieux on 24 Feb 2010 10:08 On Feb 24, 10:43 am, "Donal K. Fellows" <donal.k.fell...(a)manchester.ac.uk> wrote: > And trying to do all the parsing in a single level of > FSM[*] is painful. > [* I keep reading that as "Flying Spaghetti Monster". Seems apt to > me. :-) ] Except air density is a bit low on this planet for such objects to fly, I find it apt too :-) -Alex
From: tom.rmadilo on 24 Feb 2010 12:11
On Feb 24, 7:06 am, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com> wrote: > On Feb 24, 10:42 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > > > > > > If you've found a case where fileevent repeatedly fires a false alarm > > > (and not only sporadic ones as it has been reported with TLS), please > > > file a bug. > > > For instance I have logs of several request/response (chunked and not > > chunked) at: > > >http://www.junom.com/gitweb/gitweb.perl?p=htclient.git;a=tree;f=examp... > > Repeat: _file a bug_ at SF. > This implies that you'll first have to make an effort to summarize, > circumscribe, describe, characterize. A brain dump sprinkled with logs > won't do. Alex, you are the only one suggesting this is a bug. And if you look closely, below you suggest it isn't a bug, just a big misunderstanding on my part. Right now I just call it consistent behavior. > > > Otherwise, stick to the idioms provided and use them. > > > Not sure what idioms you think I'm not sticking to. > > The proc skeleton written by Eric that I've just copied. But I'm not using [gets], so the idiom doesn't apply. Here is how chunk data is read: proc ::htclient::htChunkData { client } { variable clientChunkRemain variable clientContent variable clientContentLength set sock [getVar $client sock] set pending [chan pending input $sock] set remain $clientChunkRemain($client) if {!$pending} { append clientContent($client) [read $sock 1] incr clientChunkRemain($client) -1 incr clientContentLength($client) 1 } elseif {$remain >= $pending} { append clientContent($client) [read $sock $pending] incr clientChunkRemain($client) -$pending incr clientContentLength($client) $pending } else { append clientContent($client) [read $sock $remain] incr clientChunkRemain($client) -$remain incr clientContentLength($client) $remain } log Debug "htChunkData pending $client = $pending\ start = $remain,\ remain = $clientChunkRemain($client)" if {!$clientChunkRemain($client)} { setVar $client state chunkend } } Please note the absence of an explicit loop. Each fileevent is handled separately. > > I'm not sure if these are false alarms or not. If I don't read one > > char, I get an infinite loop and [chan pending] stays at zero. > > Re-read what I've written on the two kinds of fileevents (fd-based or > buffer-based). > Of course if your fileevent handlers doesn't read anything, it will re- > fire indefinitely. I know that, I was just describing the behavior. If a readable fileevent were generated by accident and then [chan pending] reports zero bytes pending, it seems like eventually there would be bytes available. Instead, if I do [read $chan 0] and return, the next fileevent, and the next, etc. gives the same result. > And of course [chan pending] says 0 at that time since no byte has > been read yet. > What did you expect ? Is there something wrong with the manpage for [chan pending]: chan pending mode channelId Depending on whether mode is "input" or "output", returns the number of bytes of input or output (respectively) currently buffered internally for channelId (especially useful in a readable event callback to impose application-specific limits on input line lengths to avoid a potential denial-of-service attack where a hostile user crafts an extremely long line that exceeds the available memory to buffer it). Returns -1 if the channel was not opened for the mode in question. So if [chan pending] returns 0 it means "number of bytes currently buffered internally" is zero. The buffer is empty. Of course I have no idea what the actual "readable event" was, but it can't be an error condition because I'm able to read from the channel. |