htclient: new experimental HTTP Client [TCL]

Prev: Scrolling in tile
Next: Tcl 8.6 & IncrTcl...

From: Alexandre Ferrieux on 2 Nov 2009 03:31

On Oct 31, 8:12 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
> On Oct 31, 9:44 am, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
> wrote:
>
> > On Oct 30, 8:07 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
> > > How do you do chunked reads and signal when to stop, remove the
> > > <cr><lf>, read the next chunk size, remove the <cr><lf> and start
> > > reading again?
>
> > With a state machine, fileevents, and nonblocking gets.
>
> HTTP data is binary, or maybe it is better to say it is opaque. The
> chunked transfer encoding is specifically byte oriented with a well
> defined structure. Nothing in the standards I have read indicates that
> you should treat the data as line oriented.

Don't expect an RFC to hold your hand as to "how you should treat
data" ;-)

What I'm saying is merely that chunked transfer is an alternated text/
binary syntax, and that implementing it as [gets;read] is the most
natural way. Moreover, it turns out to be _efficient_ thanks to input
buffering.

> I've also pointed out several times that since gets can fail, you have
> to handle error conditions: again this just adds new, unnecessary
> states to the machine.

Uh ??? You may point something out several times, if it still lacks
arguments...
Are you unfamiliar with error propagation in Tcl, or are you just
testing my resilience to random nonsense ?

-Alex

From: tom.rmadilo on 2 Nov 2009 17:32

On Nov 2, 12:31 am, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
wrote:
> On Oct 31, 8:12 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
>
> > On Oct 31, 9:44 am, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
> > wrote:
>
> > > On Oct 30, 8:07 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
> > > > How do you do chunked reads and signal when to stop, remove the
> > > > <cr><lf>, read the next chunk size, remove the <cr><lf> and start
> > > > reading again?
>
> > > With a state machine, fileevents, and nonblocking gets.
>
> > HTTP data is binary, or maybe it is better to say it is opaque. The
> > chunked transfer encoding is specifically byte oriented with a well
> > defined structure. Nothing in the standards I have read indicates that
> > you should treat the data as line oriented.
>
> Don't expect an RFC to hold your hand as to "how you should treat
> data" ;-)
>
> What I'm saying is merely that chunked transfer is an alternated text/
> binary syntax, and that implementing it as [gets;read] is the most
> natural way. Moreover, it turns out to be _efficient_ thanks to input
> buffering.

Why not show me? I've provided code and a framework which should make
it easy to plug in your chunked transfer implementation.

In addition, if you use my chunk-data to read the data, you only need
to replace this proc:

proc ::htclient::htChunkSize { client } {

variable clientChunkRemain
variable hexCharMap

set sock [getVar $client sock]

set char [read $sock 1]
set class $hexCharMap($char)

log Debug "htChunkSize char $client = '$char' class =\
$class size = $clientChunkRemain($client)"

switch -exact -- $class {

"HEXCHAR" {
if {![getVar $client inCR]} {
set val $clientChunkRemain($client)
set cval [format %i 0x$char]
set val [expr {$val * 16 + $cval}]
set clientChunkRemain($client) $val
} else {
htError $client msg "unexpected HEXCHAR in CR mode"
}
}
"CR" {
if {![getVar $client inCR]} {
setVar $client inCR 1
} else {
htError $client msg "unexpected CR in Chunk Size"
}
}
"LF" {
if {[getVar $client inCR]} {
setVar $client inCR 0
if {!$clientChunkRemain($client)} {
setVar $client state done
} else {
log Debug "Chunk size $client = $clientChunkRemain($client)"
setVar $client state chunkdata
}
} else {
htError $client msg "unexpected LF in Chunk Size"
}
}
"EMPTY" {

}
"NONHEX" {
htError $client lastchar $char msg "unexpected NONHEX in Chunk
Size"
}
}
}

From: tom.rmadilo on 2 Nov 2009 19:19

On Nov 2, 12:31 am, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
wrote:
> On Oct 31, 8:12 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
> > I've also pointed out several times that since gets can fail, you have
> > to handle error conditions: again this just adds new, unnecessary
> > states to the machine.
>
> Uh ??? You may point something out several times, if it still lacks
> arguments...
> Are you unfamiliar with error propagation in Tcl, or are you just
> testing my resilience to random nonsense ?

There are two kinds of errors:
1. errors you can/could predict, and
2. those you can't

The first type of error is the result of bugs or poor programming. It
means you could have detected ahead of time that the next operation
will fail given the current known information or state.

The second type of errors are the result of external conditions that
can't be known or predicted in advance, even if they happen often and
can be easily classified.

I'm not interested in the first type of errors, other than to suggest
removing them from code as quickly as possible.

The second type of errors have to be analyzed in terms of their effect
on the application. If an error results in the loss of a resource, the
damage is serious and the code which reports the error should be
isolated so that resource loss can be minimized, but then the error
should propagate up so that the application can decide what to do.

Otherwise, the error just needs to be reported up (allowed to happen).

Basically this is the Mafia Theory of Error Management: protect but
notify the boss.

From: Alexandre Ferrieux on 3 Nov 2009 03:11

On Nov 2, 11:32 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
>
> Why not show me? I've provided code and a framework which should make
> it easy to plug in your chunked transfer implementation.

Yup, here ya go.

# assuming socket in non-blocking mode and fileevent readable
# on function below

proc ::htclient::htChunkSize { client } {
variable clientChunkRemain
set sock [getVar $client sock]
if {[gets $sock line]<0} {
if {![eof $sock]} return
htError $client msg "unexpected EOF between chunks"
}
if {![regexp -nocase {^[0-9A-F]+\r$} $line]} {
htError $client msg "illegal chunk size syntax"
}
scan $line %x clientChunkRemain($client)
if {!$clientChunkRemain($client)} {
setVar $client state done
} else {
log Debug "Chunk size $client = $clientChunkRemain($client)"
setVar $client state chunkdata
}
}

-Alex

From: tom.rmadilo on 3 Nov 2009 12:37

On Nov 3, 12:11 am, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
wrote:
> On Nov 2, 11:32 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
>
>
>
> > Why not show me? I've provided code and a framework which should make
> > it easy to plug in your chunked transfer implementation.
>
> Yup, here ya go.
>
> # assuming socket in non-blocking mode and fileevent readable
> # on function below
>
> proc ::htclient::htChunkSize { client } {
> variable clientChunkRemain
> set sock [getVar $client sock]
> if {[gets $sock line]<0} {
> if {![eof $sock]} return
> htError $client msg "unexpected EOF between chunks"
> }
> if {![regexp -nocase {^[0-9A-F]+\r$} $line]} {
> htError $client msg "illegal chunk size syntax"
> }
> scan $line %x clientChunkRemain($client)
> if {!$clientChunkRemain($client)} {
> setVar $client state done
> } else {
> log Debug "Chunk size $client = $clientChunkRemain($client)"
> setVar $client state chunkdata
> }
> }

Okay, I setup a test case using your code:

http://www.junom.com/gitweb/gitweb.perl?p=htclient.git;a=commit;h=4c449

Depending on the url and the number of simultaneous downloads, your
version is sometimes consistently faster 1-5%, in other cases, the
original code is 1-5% faster.

But the variability between tests is much larger than the average
difference (20-25%).

One thing is very consistent: both the old and new code are about 100%
faster (twice as fast) than ::http::geturl when grabbing a single copy
of a url. When grabbing 10 copies, both old and new code are about
200% faster (three times as fast).

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: Scrolling in tile
Next: Tcl 8.6 & IncrTcl...