Prev: Scrolling in tile
Next: Tcl 8.6 & IncrTcl...
From: Alexandre Ferrieux on 3 Nov 2009 17:22 On Nov 3, 6:37 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > On Nov 3, 12:11 am, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com> > wrote: > > > > > > > On Nov 2, 11:32 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > > > > Why not show me? I've provided code and a framework which should make > > > it easy to plug in your chunked transfer implementation. > > > Yup, here ya go. > > > # assuming socket in non-blocking mode and fileevent readable > > # on function below > > > proc ::htclient::htChunkSize { client } { > > variable clientChunkRemain > > set sock [getVar $client sock] > > if {[gets $sock line]<0} { > > if {![eof $sock]} return > > htError $client msg "unexpected EOF between chunks" > > } > > if {![regexp -nocase {^[0-9A-F]+\r$} $line]} { > > htError $client msg "illegal chunk size syntax" > > } > > scan $line %x clientChunkRemain($client) > > if {!$clientChunkRemain($client)} { > > setVar $client state done > > } else { > > log Debug "Chunk size $client = $clientChunkRemain($client)" > > setVar $client state chunkdata > > } > > } > > Okay, I setup a test case using your code: > > http://www.junom.com/gitweb/gitweb.perl?p=htclient.git;a=commit;h=4c449 > > Depending on the url and the number of simultaneous downloads, your > version is sometimes consistently faster 1-5%, in other cases, the > original code is 1-5% faster. That's not the point. The point is that the gets version is much smaller, and also much more "natural" in that it uses the "normal" primitives for line I/O and hex scanning instead of reinventing everything. This matters a lot for maintenance, especially by _others_. -Alex
From: tom.rmadilo on 3 Nov 2009 18:46 On Nov 3, 2:22 pm, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com> wrote: > On Nov 3, 6:37 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > > > > > On Nov 3, 12:11 am, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com> > > wrote: > > > > On Nov 2, 11:32 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > > > > > Why not show me? I've provided code and a framework which should make > > > > it easy to plug in your chunked transfer implementation. > > > > Yup, here ya go. > > > > # assuming socket in non-blocking mode and fileevent readable > > > # on function below > > > > proc ::htclient::htChunkSize { client } { > > > variable clientChunkRemain > > > set sock [getVar $client sock] > > > if {[gets $sock line]<0} { > > > if {![eof $sock]} return > > > htError $client msg "unexpected EOF between chunks" > > > } > > > if {![regexp -nocase {^[0-9A-F]+\r$} $line]} { > > > htError $client msg "illegal chunk size syntax" > > > } > > > scan $line %x clientChunkRemain($client) > > > if {!$clientChunkRemain($client)} { > > > setVar $client state done > > > } else { > > > log Debug "Chunk size $client = $clientChunkRemain($client)" > > > setVar $client state chunkdata > > > } > > > } > > > Okay, I setup a test case using your code: > > >http://www.junom.com/gitweb/gitweb.perl?p=htclient.git;a=commit;h=4c449 > > > Depending on the url and the number of simultaneous downloads, your > > version is sometimes consistently faster 1-5%, in other cases, the > > original code is 1-5% faster. > > That's not the point. The point is that the gets version is much > smaller, and also much more "natural" in that it uses the "normal" > primitives for line I/O and hex scanning instead of reinventing > everything. This matters a lot for maintenance, especially by > _others_. You're joking right? You actually have trouble understanding this? HTTP isn't a line oriented protocol. In fact, the chunkSize code isn't even complete, both of our versions will fail with extensions, which have to be parsed, like headers (even if they are ignored). Here is the full definition: Chunked-Body = *chunk last-chunk trailer-part CRLF chunk = chunk-size *WSP [ chunk-ext ] CRLF chunk-data CRLF chunk-size = 1*HEXDIG last-chunk = 1*("0") *WSP [ chunk-ext ] CRLF chunk-ext = *( ";" *WSP chunk-ext-name [ "=" chunk-ext-val ] *WSP ) chunk-ext-name = token chunk-ext-val = token / quoted-str-nf chunk-data = 1*OCTET ; a sequence of chunk-size octets trailer-part = *( entity-header CRLF ) quoted-str-nf = DQUOTE *( qdtext-nf / quoted-pair ) DQUOTE ; like quoted-string, but disallowing line folding qdtext-nf = WSP / %x21 / %x23-5B / %x5D-7E / obs-text ; WSP / <VCHAR except DQUOTE and "\"> / obs-text Somehow you think CRLF means "line I/O". I'm just not seeing anything here indicating a "line". I see a protocol where character interpretation changes with each char read.
From: Alexandre Ferrieux on 3 Nov 2009 19:19 On Nov 4, 12:46 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > > > > > Depending on the url and the number of simultaneous downloads, your > > > version is sometimes consistently faster 1-5%, in other cases, the > > > original code is 1-5% faster. > > > That's not the point. The point is that the gets version is much > > smaller, and also much more "natural" in that it uses the "normal" > > primitives for line I/O and hex scanning instead of reinventing > > everything. This matters a lot for maintenance, especially by > > _others_. > > You're joking right? You actually have trouble understanding this? Are you having trouble staying polite ? Don't drag me in that area... > HTTP isn't a line oriented protocol. In fact, the chunkSize code isn't > even complete, both of our versions will fail with extensions, which > have to be parsed, like headers (even if they are ignored). Here is > the full definition: > [...] > Somehow you think CRLF means "line I/O". Stop lecturing me, will you ? You're reinventing just about any wheel that comes your way; I may suffer from the NIH syndrome from time to time, but you're beating me by several orders of magnitude... I'm merely suggesting that [gets] offers a nice way of avoiding reading char-by-char. It will segment our input on LFs (since we're using -translation binary of course), which loses absolutely no information. It only helps managing groups of bytes larger than one. Once you get something ended by an LF, you're free to go back on the string and parse it according to the full RFC2616 syntax. My code omitted part of that syntax (chunkext), but that is fixable with a simple regexp. If that obvious extrapolation is beyond you, get lost. -Alex
From: tom.rmadilo on 3 Nov 2009 20:08 On Nov 3, 4:19 pm, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com> wrote: > On Nov 4, 12:46 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > > > > > > > Depending on the url and the number of simultaneous downloads, your > > > > version is sometimes consistently faster 1-5%, in other cases, the > > > > original code is 1-5% faster. > > > > That's not the point. The point is that the gets version is much > > > smaller, and also much more "natural" in that it uses the "normal" > > > primitives for line I/O and hex scanning instead of reinventing > > > everything. This matters a lot for maintenance, especially by > > > _others_. > > > You're joking right? You actually have trouble understanding this? > > Are you having trouble staying polite ? Don't drag me in that area... > > > HTTP isn't a line oriented protocol. In fact, the chunkSize code isn't > > even complete, both of our versions will fail with extensions, which > > have to be parsed, like headers (even if they are ignored). Here is > > the full definition: > > [...] > > Somehow you think CRLF means "line I/O". > > Stop lecturing me, will you ? You're reinventing just about any wheel > that comes your way; I may suffer from the NIH syndrome from time to > time, but you're beating me by several orders of magnitude... > What wheel am I reinventing? Basically you don't understand, or wish to gloss over the details so your simple ten line proc is all that is needed for this job. Even my longer proc is not complete, but it is closer. > I'm merely suggesting that [gets] offers a nice way of avoiding > reading char-by-char. It will segment our input on LFs (since we're > using -translation binary of course), which loses absolutely no > information. It only helps managing groups of bytes larger than one. Groups which have to eventually divided into single bytes. You save nothing by making an extra copy of the data. > Once you get something ended by an LF, you're free to go back on the > string and parse it according to the full RFC2616 syntax. I'm not sure why you want to avoid what you have to do eventually. You are using [gets] then just omitting what I am doing to correctly interpret the chunk-size, chunk-ext info. Then you claim this is easier to maintain. Of course it is easier to maintain. > My code > omitted part of that syntax (chunkext), but that is fixable with a > simple regexp. If that obvious extrapolation is beyond you, get lost. So my code is longer because it does more. Your code would be completely different if you had to go back and find the chunk-size. My code would be essentially the same, with an option to continue to parse chunk-ext. Using [gets] just creates a new buffer to parse. I'm parsing from the channel buffer, you create a new buffer to parse from, one where you also have to maintain a position index. Why you think this saves anything in terms of code size or logic is beyond me. I get free buffer and index management, you create the need to manage your own buffer. Totally crazy. Now everyone who wants to "maintain" this code must understand your decision to manage your own buffer. Simply, you have read in too much information. Since I also have written cookie and cookie2 parsing code, I can easily compare the simplicity of this method to the managed buffer mess.
From: Alexandre Ferrieux on 4 Nov 2009 02:43
On Nov 4, 2:08 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > > > Once you get something ended by an LF, you're free to go back on the > > string and parse it according to the full RFC2616 syntax. > > I'm not sure why you want to avoid what you have to do eventually. You > are using [gets] then just omitting what I am doing to correctly > interpret the chunk-size, chunk-ext info. You being intentionally dense or what ? Of course [gets] doesn't parse what's in the line. [gets;regexp] does. > Then you claim this is easier to maintain. Yes I claim that a regexp is easier to read/maintain than the equivalent char-by-char automaton written in Tcl. Moreover it gets faster for complex automata. > Using [gets] just creates a new buffer to parse. I'm parsing from the > channel buffer, you create a new buffer to parse from, one where you > also have to maintain a position index. What position index ? Meet Mr re_syntax... > Totally crazy. Patience exhausted. EOT. -Alex |