htclient: new experimental HTTP Client [TCL]

Prev: Scrolling in tile
Next: Tcl 8.6 & IncrTcl...

From: Alexandre Ferrieux on 3 Nov 2009 17:22

On Nov 3, 6:37 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
> On Nov 3, 12:11 am, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
> wrote:
>
>
>
>
>
> > On Nov 2, 11:32 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
>
> > > Why not show me? I've provided code and a framework which should make
> > > it easy to plug in your chunked transfer implementation.
>
> > Yup, here ya go.
>
> > # assuming socket in non-blocking mode and fileevent readable
> > # on function below
>
> > proc ::htclient::htChunkSize { client } {
> > variable clientChunkRemain
> > set sock [getVar $client sock]
> > if {[gets $sock line]<0} {
> > if {![eof $sock]} return
> > htError $client msg "unexpected EOF between chunks"
> > }
> > if {![regexp -nocase {^[0-9A-F]+\r$} $line]} {
> > htError $client msg "illegal chunk size syntax"
> > }
> > scan $line %x clientChunkRemain($client)
> > if {!$clientChunkRemain($client)} {
> > setVar $client state done
> > } else {
> > log Debug "Chunk size $client = $clientChunkRemain($client)"
> > setVar $client state chunkdata
> > }
> > }
>
> Okay, I setup a test case using your code:
>
> http://www.junom.com/gitweb/gitweb.perl?p=htclient.git;a=commit;h=4c449
>
> Depending on the url and the number of simultaneous downloads, your
> version is sometimes consistently faster 1-5%, in other cases, the
> original code is 1-5% faster.

That's not the point. The point is that the gets version is much
smaller, and also much more "natural" in that it uses the "normal"
primitives for line I/O and hex scanning instead of reinventing
everything. This matters a lot for maintenance, especially by
_others_.

-Alex

From: tom.rmadilo on 3 Nov 2009 18:46

On Nov 3, 2:22 pm, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
wrote:
> On Nov 3, 6:37 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
>
>
>
> > On Nov 3, 12:11 am, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
> > wrote:
>
> > > On Nov 2, 11:32 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
>
> > > > Why not show me? I've provided code and a framework which should make
> > > > it easy to plug in your chunked transfer implementation.
>
> > > Yup, here ya go.
>
> > > # assuming socket in non-blocking mode and fileevent readable
> > > # on function below
>
> > > proc ::htclient::htChunkSize { client } {
> > > variable clientChunkRemain
> > > set sock [getVar $client sock]
> > > if {[gets $sock line]<0} {
> > > if {![eof $sock]} return
> > > htError $client msg "unexpected EOF between chunks"
> > > }
> > > if {![regexp -nocase {^[0-9A-F]+\r$} $line]} {
> > > htError $client msg "illegal chunk size syntax"
> > > }
> > > scan $line %x clientChunkRemain($client)
> > > if {!$clientChunkRemain($client)} {
> > > setVar $client state done
> > > } else {
> > > log Debug "Chunk size $client = $clientChunkRemain($client)"
> > > setVar $client state chunkdata
> > > }
> > > }
>
> > Okay, I setup a test case using your code:
>
> >http://www.junom.com/gitweb/gitweb.perl?p=htclient.git;a=commit;h=4c449
>
> > Depending on the url and the number of simultaneous downloads, your
> > version is sometimes consistently faster 1-5%, in other cases, the
> > original code is 1-5% faster.
>
> That's not the point. The point is that the gets version is much
> smaller, and also much more "natural" in that it uses the "normal"
> primitives for line I/O and hex scanning instead of reinventing
> everything. This matters a lot for maintenance, especially by
> _others_.

You're joking right? You actually have trouble understanding this?
HTTP isn't a line oriented protocol. In fact, the chunkSize code isn't
even complete, both of our versions will fail with extensions, which
have to be parsed, like headers (even if they are ignored). Here is
the full definition:

Chunked-Body = *chunk
last-chunk
trailer-part
CRLF

chunk = chunk-size *WSP [ chunk-ext ] CRLF
chunk-data CRLF
chunk-size = 1*HEXDIG
last-chunk = 1*("0") *WSP [ chunk-ext ] CRLF

chunk-ext = *( ";" *WSP chunk-ext-name
[ "=" chunk-ext-val ] *WSP )
chunk-ext-name = token
chunk-ext-val = token / quoted-str-nf
chunk-data = 1*OCTET ; a sequence of chunk-size octets
trailer-part = *( entity-header CRLF )

quoted-str-nf = DQUOTE *( qdtext-nf / quoted-pair ) DQUOTE
; like quoted-string, but disallowing line folding
qdtext-nf = WSP / %x21 / %x23-5B / %x5D-7E / obs-text
; WSP / <VCHAR except DQUOTE and "\"> / obs-text

Somehow you think CRLF means "line I/O".

I'm just not seeing anything here indicating a "line". I see a
protocol where character interpretation changes with each char read.

From: Alexandre Ferrieux on 3 Nov 2009 19:19

On Nov 4, 12:46 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
> >
> > > Depending on the url and the number of simultaneous downloads, your
> > > version is sometimes consistently faster 1-5%, in other cases, the
> > > original code is 1-5% faster.
>
> > That's not the point. The point is that the gets version is much
> > smaller, and also much more "natural" in that it uses the "normal"
> > primitives for line I/O and hex scanning instead of reinventing
> > everything. This matters a lot for maintenance, especially by
> > _others_.
>
> You're joking right? You actually have trouble understanding this?

Are you having trouble staying polite ? Don't drag me in that area...

> HTTP isn't a line oriented protocol. In fact, the chunkSize code isn't
> even complete, both of our versions will fail with extensions, which
> have to be parsed, like headers (even if they are ignored). Here is
> the full definition:
> [...]
> Somehow you think CRLF means "line I/O".

Stop lecturing me, will you ? You're reinventing just about any wheel
that comes your way; I may suffer from the NIH syndrome from time to
time, but you're beating me by several orders of magnitude...

I'm merely suggesting that [gets] offers a nice way of avoiding
reading char-by-char. It will segment our input on LFs (since we're
using -translation binary of course), which loses absolutely no
information. It only helps managing groups of bytes larger than one.
Once you get something ended by an LF, you're free to go back on the
string and parse it according to the full RFC2616 syntax. My code
omitted part of that syntax (chunkext), but that is fixable with a
simple regexp. If that obvious extrapolation is beyond you, get lost.

-Alex

From: tom.rmadilo on 3 Nov 2009 20:08

On Nov 3, 4:19 pm, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
wrote:
> On Nov 4, 12:46 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
>
>
>
> > > > Depending on the url and the number of simultaneous downloads, your
> > > > version is sometimes consistently faster 1-5%, in other cases, the
> > > > original code is 1-5% faster.
>
> > > That's not the point. The point is that the gets version is much
> > > smaller, and also much more "natural" in that it uses the "normal"
> > > primitives for line I/O and hex scanning instead of reinventing
> > > everything. This matters a lot for maintenance, especially by
> > > _others_.
>
> > You're joking right? You actually have trouble understanding this?
>
> Are you having trouble staying polite ? Don't drag me in that area...
>
> > HTTP isn't a line oriented protocol. In fact, the chunkSize code isn't
> > even complete, both of our versions will fail with extensions, which
> > have to be parsed, like headers (even if they are ignored). Here is
> > the full definition:
> > [...]
> > Somehow you think CRLF means "line I/O".
>
> Stop lecturing me, will you ? You're reinventing just about any wheel
> that comes your way; I may suffer from the NIH syndrome from time to
> time, but you're beating me by several orders of magnitude...
>

What wheel am I reinventing? Basically you don't understand, or wish
to gloss over the details so your simple ten line proc is all that is
needed for this job. Even my longer proc is not complete, but it is
closer.

> I'm merely suggesting that [gets] offers a nice way of avoiding
> reading char-by-char. It will segment our input on LFs (since we're
> using -translation binary of course), which loses absolutely no
> information. It only helps managing groups of bytes larger than one.

Groups which have to eventually divided into single bytes. You save
nothing by making an extra copy of the data.

> Once you get something ended by an LF, you're free to go back on the
> string and parse it according to the full RFC2616 syntax.

I'm not sure why you want to avoid what you have to do eventually. You
are using [gets] then just omitting what I am doing to correctly
interpret the chunk-size, chunk-ext info. Then you claim this is
easier to maintain. Of course it is easier to maintain.

> My code
> omitted part of that syntax (chunkext), but that is fixable with a
> simple regexp. If that obvious extrapolation is beyond you, get lost.

So my code is longer because it does more. Your code would be
completely different if you had to go back and find the chunk-size. My
code would be essentially the same, with an option to continue to
parse chunk-ext.

Using [gets] just creates a new buffer to parse. I'm parsing from the
channel buffer, you create a new buffer to parse from, one where you
also have to maintain a position index. Why you think this saves
anything in terms of code size or logic is beyond me. I get free
buffer and index management, you create the need to manage your own
buffer. Totally crazy. Now everyone who wants to "maintain" this code
must understand your decision to manage your own buffer. Simply, you
have read in too much information. Since I also have written cookie
and cookie2 parsing code, I can easily compare the simplicity of this
method to the managed buffer mess.

From: Alexandre Ferrieux on 4 Nov 2009 02:43

On Nov 4, 2:08 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
>
> > Once you get something ended by an LF, you're free to go back on the
> > string and parse it according to the full RFC2616 syntax.
>
> I'm not sure why you want to avoid what you have to do eventually. You
> are using [gets] then just omitting what I am doing to correctly
> interpret the chunk-size, chunk-ext info.

You being intentionally dense or what ? Of course [gets] doesn't parse
what's in the line. [gets;regexp] does.

> Then you claim this is easier to maintain.

Yes I claim that a regexp is easier to read/maintain than the
equivalent char-by-char automaton written in Tcl. Moreover it gets
faster for complex automata.

> Using [gets] just creates a new buffer to parse. I'm parsing from the
> channel buffer, you create a new buffer to parse from, one where you
> also have to maintain a position index.

What position index ? Meet Mr re_syntax...

> Totally crazy.

Patience exhausted. EOT.

-Alex

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: Scrolling in tile
Next: Tcl 8.6 & IncrTcl...