Line oriented protocols vs. [gets] [TCL]

Prev: lreplace behaviour change in tcl 8
Next: Expect/TCL Configuration Issue - Form POST submit not working

From: Andreas Leitgeb on 28 Jun 2010 06:23

tom.rmadilo <tom.rmadilo(a)gmail.com> wrote:
>> What protocol do you have in mind, that you think tcl's [gets]
>> won't help you with? (http and smtp are actually perfectly well
>> dealt with gets)
> The short explanation is that a [gets] concept of a line doesn't match
> up with a protocol line,

Just to be clear: is your intention, that the following snippet

---begin-snippet---
Foo: blah blah blarg
ramble ramble
---end-snippet---

would be completely read (and returned) by one single [gets]-invocation??

If instead you meant, that you'd like to have some [getheaderline] for that
job, then that's a principially far more agreeable position.

Or if instead you have some suggestion for [gets] that would make it a
better building block for your own [getheaderline], without breaking all
currently working scripts, then spell them out.

From: Arnold Snarb on 30 Jun 2010 12:42

Andreas Leitgeb wrote:
> tom.rmadilo wrote:
>>>> [....]
>> This is exactly my point: if you used Tcl's [gets] to read Tcl source
>> code and transform it into Tcl code, it would not do what you expect.
>
> I've really tried to understand, what you're after, but failed.

That's because you're trying to talk to tom.rmadilo.

Many of us have the same feeling.

--Arnold

From: Larry W. Virden on 1 Jul 2010 09:03

On Jun 26, 7:29 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
> On Jun 25, 11:01 am, Andreas Leitgeb <a...(a)gamma.logic.tuwien.ac.at>
> wrote:

>
> But
> promoting Tcl's [gets] as some kind of miracle magic line interpreting
> primitive is total bullshit.

I've been reading this thread and found it a bit confusing. And,
finally, I see a statement that, if explained, would go a ways towards
understanding the situation.

I take it that your concern relates to the fact that within the Tcl
http library modules, the code uses gets to read sockets when reading
http, and you want to make the point that this is not a safe thing to
do - that some other command should be used that deals with buffer
overflows, continuation lines, etc. is a better implementation.

Am I close to understanding where you were at the start of this
thread?

Thanks for helping me understand.

From: tom.rmadilo on 1 Jul 2010 15:39

On Jul 1, 6:03 am, "Larry W. Virden" <lvir...(a)gmail.com> wrote:
> On Jun 26, 7:29 pm, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
>
> > On Jun 25, 11:01 am, Andreas Leitgeb <a...(a)gamma.logic.tuwien.ac.at>
> > wrote:
>
> > But
> > promoting Tcl's [gets] as some kind of miracle magic line interpreting
> > primitive is total bullshit.
>
> I've been reading this thread and found it a bit confusing. And,
> finally, I see a statement that, if explained, would go a ways towards
> understanding the situation.
>
> I take it that your concern relates to the fact that within the Tcl
> http library modules, the code uses gets to read sockets when reading
> http, and you want to make the point that this is not a safe thing to
> do - that some other command should be used that deals with buffer
> overflows, continuation lines, etc. is a better implementation.
>
> Am I close to understanding where you were at the start of this
> thread?
>
> Thanks for helping me understand.

This is basically my point. [gets] in Tcl is relatively safe when
compared to other languages, and works great for many common
applications.

However there are a number of deficiencies in the Tcl I/O API which
make it not very efficient when dealing with a generic protocol. BTW,
I thought I had found a way to deal with many of the problems with my
example htclient, however some have complained that it is inefficient.
I am willing to compromise efficiency for safety, but I would not mind
additional API options.

At a minimum, Tcl's I/O API lacks two feature: timed wait on blocking
channels and max byte/char reads on any channel (allowing single call
protection against overflow). Plus the ability to do both: wait for n
bytes, on timeout return the number of bytes received (plus some
pointer to the bytes). Note that when a channel becomes readable, you
can read at least one byte. But how many bytes can you read without
blocking for an additional network I/O operation? Also, Tcl includes
channel errors as readable events, so you have to check for that as
well.

Also, there is a strange combination of using bytes vs. chars inputs
in various Tcl API. I can't figure out how you could write a valid
program which seeks a UTF-8 file (at the Tcl script level).

My questions is what could be done? There is no reason to change
[gets]. It is possible that different channel types could be
developed. We now have a generic (abstract interface) channel API. You
could create a new channel type and reuse many of the parts of
existing channel types. We also have stacked channels, although how
this might help is less clear to me.

My idea: why not make it easy to implement generic protocols in Tcl,
while still assuming that the C or C++ version will be faster? We
don't even have the tools for efficient I/O mixed with application
state changes. What we do have is relative immunity from buffer
overflow and many other issues affecting languages such as C, C++,
Java, .NET, etc.

From: Donal K. Fellows on 1 Jul 2010 19:45

On 01/07/2010 20:39, tom.rmadilo wrote:
> At a minimum, Tcl's I/O API lacks two feature: timed wait on blocking
> channels

Too bad the OS's own API doesn't allow for that (except by sending a
signal to interrupt, which is a *very* crude method). You need to use a
non-blocking channel and some of the other facilities that you *do* have.

> and max byte/char reads on any channel (allowing single call
> protection against overflow).

You need [chan pending] for that, added in 8.5. That lets you see how
much is currently buffered inside Tcl. Combined with non-blocking
channels and [after] events, that lets you do safe reading of lines with
[gets]. The code to do it is more than I'm willing to write at around
midnight. :-)

> Plus the ability to do both: wait for n bytes, on timeout return the
> number of bytes received (plus some pointer to the bytes). Note that
> when a channel becomes readable, you can read at least one byte.

Actually, when a channel becomes available you know that a [read] of one
byte will not block but not that a byte is available; a closed channel
is the other main source of such events. When the channel is
non-blocking, you know that you'll always only get bytes or characters
from the data that is available (which might or might not involve a call
to the OS, depending on what is actually buffered).

> But how many bytes can you read without blocking for an additional
> network I/O operation? Also, Tcl includes channel errors as readable
> events, so you have to check for that as well.

Actually, it all works rather well (especially if you use 8.6's
coroutines to hide the details, c.f. http://wiki.tcl.tk/22231). You can
cap the amount that you buffer in non-blocking mode (i.e., to no more
than a fixed amount more than some limit you can decide) and you can
handle timeouts any way you want.

> Also, there is a strange combination of using bytes vs. chars inputs
> in various Tcl API. I can't figure out how you could write a valid
> program which seeks a UTF-8 file (at the Tcl script level).

[seek] and [tell] always work with byte addresses, but they *are* aware
of what's in Tcl's buffers. If you're not going to the start or the end
of the file, you need to get to the point where you want to remember and
use [tell] to remember it so you can [seek] back there again. It tends
to be fairly rare that they're used in text files; they're just not that
useful with variable length records.

> My idea: why not make it easy to implement generic protocols in Tcl,
> while still assuming that the C or C++ version will be faster? We
> don't even have the tools for efficient I/O mixed with application
> state changes. What we do have is relative immunity from buffer
> overflow and many other issues affecting languages such as C, C++,
> Java, .NET, etc.

Have you measured the inefficiency, or is this supposition?

Donal.

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: lreplace behaviour change in tcl 8
Next: Expect/TCL Configuration Issue - Form POST submit not working