Tcl/Tk 8.5.7 supplied with HTTP/1.0 ? [TCL]

Prev: hard drive serial number
Next: uploading a file and form data

From: tom.rmadilo on 24 Feb 2010 04:42

On Feb 23, 11:59 pm, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
wrote:
> On Feb 23, 11:16 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
>
>
>
> > > Re-read the manpage. [chan pending input] gives the number of bytes
> > > _already_ buffered. When a fileevent fires, it can be of two kinds: fd-
> > > level when there's no buffered byte and select was called upon, and
> > > buffer-level when there are buffered bytes, and no select came into
> > > play.
>
> > Given my example, I would say that this analysis is incomplete. If a
> > fileevent is triggered, and [chan pending] then returns "0", and no
> > data is read (and the callback returns), the buffer never fills up.
> > You get into an infinite loop.
>
> ??? What the heck are you talking about ???
> If you've found a case where fileevent repeatedly fires a false alarm
> (and not only sporadic ones as it has been reported with TLS), please
> file a bug.

For instance I have logs of several request/response (chunked and not
chunked) at:

http://www.junom.com/gitweb/gitweb.perl?p=htclient.git;a=tree;f=example-logs

search for "pending client1 = 0". The chunked transfers are the most
enlightening. The buffer is slowly drained, goes to zero, then [chan
pending] returns zero after the next fileevent. I read one char, then
the buffer fills up again.

Here is a short snip:

Debug: htChunkSize char client1 = '1' class = HEXCHAR size = 0
Debug: htChunkSize char client1 = '4' class = HEXCHAR size = 1
Debug: htChunkSize char client1 = '8' class = HEXCHAR size = 20
Debug: htChunkSize char client1 = 'a' class = HEXCHAR size = 328
Debug: htChunkSize char client1 = '
' class = CR size = 5258
Debug: htChunkSize char client1 = '
' class = LF size = 5258
Debug: Chunk size client1 = 5258
Debug: htChunkData pending client1 = 3136 start = 5258, remain = 2122
Debug: htChunkData pending client1 = 0 start = 2122, remain = 2121
<= read one char
Debug: htChunkData pending client1 = 1447 start = 2121, remain = 674
Debug: htChunkData pending client1 = 0 start = 674, remain = 673
<= read one char
Debug: htChunkData pending client1 = 1447 start = 673, remain = 0

> Otherwise, stick to the idioms provided and use them.

Not sure what idioms you think I'm not sticking to.

I'm not sure if these are false alarms or not. If I don't read one
char, I get an infinite loop and [chan pending] stays at zero.

I also improved a little bit on the regexp. It is parsing correct
headers into tokens, but incorrect ones get parsed as well:

"abcdefg\"hijk" => {"abcdefg\"hijk"} <= valid start token
"abcdefg\\"hijk" => {"abcdefg\\"} hijk <= invalid start token

From: Donal K. Fellows on 24 Feb 2010 04:43

On 23 Feb, 17:47, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
> Right, my htclient code tokenizes the header field values as well as
> headers. If you can't tokenize the field value, you can't really
> distinguish between valid and invalid headers. Maybe a regexp exists
> which can do this, right now I use an fsm to do the job. It works and
> is guaranteed not to block.

I *think* that the natural way to process HTTP headers is as a multi-
level grammar. First you read lines until you get a proper blank line
(i.e., the end of the headers). Then you go back and split the block
of headers into individual header "lines" (which may be multiple
lines; I believe continuation lines must start with a space or tab).
Then, if desired, you parse the individual header lines (at the very
least, you need to work out what the name of the header line is, but
that should be trivial).

> This is exactly what I do to avoid blocking: I read one char at a time
> for headers or [chan pending] chars for the body.

That's quite messy. With a non-blocking channel (if you're not non-
blocking, it's time to change!) you can just do a [gets $ch lineVar]
when the channel is readable. That will either read a full line (up to
whatever is set as the line separator) and return the length of line
(minus terminator) or return -1. If it returns -1, you've either got
an EOF or you've exhausted the bytes available without being able to
get a line (i.e., [fblocked $ch] will return 1 at that point). If
you've blocked, you can just go back to sleep waiting in the event
loop for some more bytes to arrive. Or if you're being careful, you
can use [chan pending input $ch] to see whether the data that's
accumulated in the input buffers - which must be a single incomplete
line because you didn't read a complete one - has exceeded some
threshold and you're going to kill the connection for being from a
bunch of scumbags. You probably want to limit the number of complete
lines you read too, for identical reasons.

In short, having *some* buffering and memory allocation is OK. So long
as it doesn't get out of hand, it's easier to let Tcl do all those
bits for you. And trying to do all the parsing in a single level of
FSM[*] is painful.

Thanks to Alexandre for reminding me how to use [chan pending]. :-)

> If we had [chan unputs], I could eliminate about half my fsm code
> (needed to handle <CR><LF>).

You can do it by stacking a transformation on the channel that
delivers the unput-ted characters instead of the ones from the
underlying channel. But that requires 8.6 or an extension (whose name
I forget right now); the transformation API didn't get done in time
for 8.5.

Donal.
[* I keep reading that as "Flying Spaghetti Monster". Seems apt to
me. :-) ]

From: Alexandre Ferrieux on 24 Feb 2010 10:06

On Feb 24, 10:42 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
>
> > If you've found a case where fileevent repeatedly fires a false alarm
> > (and not only sporadic ones as it has been reported with TLS), please
> > file a bug.
>
> For instance I have logs of several request/response (chunked and not
> chunked) at:
>
> http://www.junom.com/gitweb/gitweb.perl?p=htclient.git;a=tree;f=examp...

Repeat: _file a bug_ at SF.
This implies that you'll first have to make an effort to summarize,
circumscribe, describe, characterize. A brain dump sprinkled with logs
won't do.

> > Otherwise, stick to the idioms provided and use them.
>
> Not sure what idioms you think I'm not sticking to.

The proc skeleton written by Eric that I've just copied.

> I'm not sure if these are false alarms or not. If I don't read one
> char, I get an infinite loop and [chan pending] stays at zero.

Re-read what I've written on the two kinds of fileevents (fd-based or
buffer-based).
Of course if your fileevent handlers doesn't read anything, it will re-
fire indefinitely.
And of course [chan pending] says 0 at that time since no byte has
been read yet.
What did you expect ?

> I also improved a little bit on the regexp. It is parsing correct
> headers into tokens, but incorrect ones get parsed as well:
>
> "abcdefg\"hijk" => {"abcdefg\"hijk"} <= valid start token
> "abcdefg\\"hijk" => {"abcdefg\\"} hijk <= invalid start token

Then please provide, in a self-contained manner, the regexp that is
failing.

-Alex

From: Alexandre Ferrieux on 24 Feb 2010 10:08

On Feb 24, 10:43 am, "Donal K. Fellows"
<donal.k.fell...(a)manchester.ac.uk> wrote:
> And trying to do all the parsing in a single level of
> FSM[*] is painful.
> [* I keep reading that as "Flying Spaghetti Monster". Seems apt to
> me. :-) ]

Except air density is a bit low on this planet for such objects to
fly, I find it apt too :-)

-Alex

From: tom.rmadilo on 24 Feb 2010 12:11

On Feb 24, 7:06 am, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
wrote:
> On Feb 24, 10:42 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
>
>
>
> > > If you've found a case where fileevent repeatedly fires a false alarm
> > > (and not only sporadic ones as it has been reported with TLS), please
> > > file a bug.
>
> > For instance I have logs of several request/response (chunked and not
> > chunked) at:
>
> >http://www.junom.com/gitweb/gitweb.perl?p=htclient.git;a=tree;f=examp...
>
> Repeat: _file a bug_ at SF.
> This implies that you'll first have to make an effort to summarize,
> circumscribe, describe, characterize. A brain dump sprinkled with logs
> won't do.

Alex, you are the only one suggesting this is a bug. And if you look
closely, below you suggest it isn't a bug, just a big misunderstanding
on my part. Right now I just call it consistent behavior.

> > > Otherwise, stick to the idioms provided and use them.
>
> > Not sure what idioms you think I'm not sticking to.
>
> The proc skeleton written by Eric that I've just copied.

But I'm not using [gets], so the idiom doesn't apply. Here is how
chunk data is read:

proc ::htclient::htChunkData { client } {

variable clientChunkRemain
variable clientContent
variable clientContentLength

set sock [getVar $client sock]
set pending [chan pending input $sock]

set remain $clientChunkRemain($client)

if {!$pending} {
append clientContent($client) [read $sock 1]
incr clientChunkRemain($client) -1
incr clientContentLength($client) 1

} elseif {$remain >= $pending} {
append clientContent($client) [read $sock $pending]
incr clientChunkRemain($client) -$pending
incr clientContentLength($client) $pending
} else {
append clientContent($client) [read $sock $remain]
incr clientChunkRemain($client) -$remain
incr clientContentLength($client) $remain
}

log Debug "htChunkData pending $client = $pending\
start = $remain,\
remain = $clientChunkRemain($client)"

if {!$clientChunkRemain($client)} {
setVar $client state chunkend
}
}

Please note the absence of an explicit loop. Each fileevent is handled
separately.

> > I'm not sure if these are false alarms or not. If I don't read one
> > char, I get an infinite loop and [chan pending] stays at zero.
>
> Re-read what I've written on the two kinds of fileevents (fd-based or
> buffer-based).
> Of course if your fileevent handlers doesn't read anything, it will re-
> fire indefinitely.

I know that, I was just describing the behavior. If a readable
fileevent were generated by accident and then [chan pending] reports
zero bytes pending, it seems like eventually there would be bytes
available. Instead, if I do [read $chan 0] and return, the next
fileevent, and the next, etc. gives the same result.

> And of course [chan pending] says 0 at that time since no byte has
> been read yet.
> What did you expect ?

Is there something wrong with the manpage for [chan pending]:

chan pending mode channelId
Depending on whether mode is "input" or "output", returns the number
of bytes of input or output (respectively) currently buffered
internally for channelId (especially useful in a readable event
callback to impose application-specific limits on input line lengths
to avoid a potential denial-of-service attack where a hostile user
crafts an extremely long line that exceeds the available memory to
buffer it). Returns -1 if the channel was not opened for the mode in
question.

So if [chan pending] returns 0 it means "number of bytes currently
buffered internally" is zero. The buffer is empty.

Of course I have no idea what the actual "readable event" was, but it
can't be an error condition because I'm able to read from the
channel.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: hard drive serial number
Next: uploading a file and form data