Line oriented protocols vs. [gets] [TCL]

Prev: lreplace behaviour change in tcl 8
Next: Expect/TCL Configuration Issue - Form POST submit not working

From: Alexandre Ferrieux on 23 Jun 2010 02:49

On Jun 23, 1:56 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote:
>
> Then offer a similar command for parsing an nntp or http header, but
> [info complete] relies on char-by-char parsing.

Tom, what the heck are you babbling about ? "Relies on char-by-char
parsing" can be said of just about any method at the algorithmic
level. But an objective difference between methods is the granularity
of the Tcl IO primitives called. In this code I showed you once again
that [gets] can be used rather than [read 1], since you were asking
for something using [gets]. Hence [info complete] is only called twice
per line, not per char.

As to "then offer...": already done so for HTTP, you just ignored it,
get lost.

-Alex

From: Andreas Leitgeb on 24 Jun 2010 10:10

tom.rmadilo <tom.rmadilo(a)gmail.com> wrote:
> The question here is what defines a "line". In Tcl this is simple...or
> at least it seems simple.
> In a Tcl source file a line ends with <CR> or <LF> or <CR><LF>.

Maybe, you just "forgot" to [fconfigure] the channel appropriately?

> In fact, if the last non-<CR>?<LF> sequence is a \, then the following
><CR>?<LF> is removed. Of course it is only removed in the
> interpretation of the source code, the next line is appended, after
> removing additional whitespace to the previous line.

This is (imho) some ugly legacy of Tcl that follows from the old days,
where saving those few bytes already on reading it into memory was a win,
as it saved some work from the interpreter.

For any reasonable line-oriented protocol, you quite surely wouldn't
want to mimick that particular backslash-newline-whitespace handling,
and even less so at the "gets"-level.

Most fortunately, [gets] and the I/O-subsystem don't do that bungle.

> Of course [gets] also suffer because it only considers the current
> contents of the input buffer. If the current buffer ends with <CR> or
> <LF>, then [gets] will assume that a complete line is in the buffer.
> If the next char is <LF> and the previous buffer ending is <CR>, is
> the <LF> a new line?

I think to remember that tcl will "remember" that single <CR>, and if
next time it sees the <LF> and remembers the last seen <CR> it will
silently discard the <LF>. (in auto-linefeed-mode only, of course)
That's just the right way of dealing with it.

> Anyway, basically [gets] is a non-lookahead parser. Hardly useful as a
> protocol primitive.

Very useful, if used correctly - namely on a correctly configured channel.
Lookahead means, that it might not return the data as soon as the line
is complete.

From: tom.rmadilo on 24 Jun 2010 15:42

On Jun 24, 7:10 am, Andreas Leitgeb <a...(a)gamma.logic.tuwien.ac.at>
wrote:
> tom.rmadilo <tom.rmad...(a)gmail.com> wrote:
> > The question here is what defines a "line". In Tcl this is simple...or
> > at least it seems simple.
> > In a Tcl source file a line ends with <CR> or <LF> or <CR><LF>.
>
> Maybe, you just "forgot" to [fconfigure] the channel appropriately?
>
> > In fact, if the last non-<CR>?<LF> sequence is a \, then the following
> ><CR>?<LF> is removed. Of course it is only removed in the
> > interpretation of the source code, the next line is appended, after
> > removing additional whitespace to the previous line.
>
> This is (imho) some ugly legacy of Tcl that follows from the old days,
> where saving those few bytes already on reading it into memory was a win,
> as it saved some work from the interpreter.

This is exactly my point: if you used Tcl's [gets] to read Tcl source
code and transform it into Tcl code, it would not do what you expect.
Because even Tcl source code does not subscribe to the definition of a
"line" as defined by [gets].

> For any reasonable line-oriented protocol, you quite surely wouldn't
> want to mimick that particular backslash-newline-whitespace handling,
> and even less so at the "gets"-level.

Reasonable line-oriented protocol? No such beast exists, because
nobody can agree on what ends a protocol line and how to represent the
same data within a protocol line (How do you include a newline as line
data in a line oriented protocol?)

> Most fortunately, [gets] and the I/O-subsystem don't do that bungle.
>
> > Of course [gets] also suffer because it only considers the current
> > contents of the input buffer. If the current buffer ends with <CR> or
> > <LF>, then [gets] will assume that a complete line is in the buffer.
> > If the next char is <LF> and the previous buffer ending is <CR>, is
> > the <LF> a new line?
>
> I think to remember that tcl will "remember" that single <CR>, and if
> next time it sees the <LF> and remembers the last seen <CR> it will
> silently discard the <LF>. (in auto-linefeed-mode only, of course)
> That's just the right way of dealing with it.
>
> > Anyway, basically [gets] is a non-lookahead parser. Hardly useful as a
> > protocol primitive.
>
> Very useful, if used correctly - namely on a correctly configured channel..
> Lookahead means, that it might not return the data as soon as the line
> is complete.

This is impossible due to the data quoting rules used in a particular
protocol. HTTP is horrible, email and nntp seem to require every line
to end in <CR><LF>. Email requires that no line can contain <CR>, <LF>
or <NUL>.

Anyway, there is a big difference between what [gets] and any
particular protocol consider as a "line". Even [gets] and [source] do
not agree, why expect [gets] and some ancient protocol to agree?

From: Andreas Leitgeb on 25 Jun 2010 14:01

tom.rmadilo <tom.rmadilo(a)gmail.com> wrote:
>>> <CR>?<LF> is removed. Of course it is only removed in the
>>> interpretation of the source code, the next line is appended, after
>>> removing additional whitespace to the previous line.
>> This is (imho) some ugly legacy of Tcl that follows from the old days,
>> where saving those few bytes already on reading it into memory was a win,
>> as it saved some work from the interpreter.
> This is exactly my point: if you used Tcl's [gets] to read Tcl source
> code and transform it into Tcl code, it would not do what you expect.

I've really tried to understand, what you're after, but failed.

What do tcl's continuation lines have to do with how other line-
protocols can be handled?

It's like saying, that because C doesn't care all that much for
newlines in C-source, its gets() function was inconsistent with
semicolon-terminated statements, and thus incapable of handling
line-protocols. I'm sure that's *not* what you meant, but I
hope that giving you feedback, you'll be able to re-express it
clearer.

The <CR>/<LF> magic it does, is not specific to [gets]. It does
in practise what almost everyone wants it to do, and those who
don't like that magic, can turn it off using [fconfigure].

What protocol do you have in mind, that you think tcl's [gets]
won't help you with? (http and smtp are actually perfectly well
dealt with gets)

From: tom.rmadilo on 26 Jun 2010 19:29

On Jun 25, 11:01 am, Andreas Leitgeb <a...(a)gamma.logic.tuwien.ac.at>
wrote:

> What protocol do you have in mind, that you think tcl's [gets]
> won't help you with? (http and smtp are actually perfectly well
> dealt with gets)

The short explanation is that a [gets] concept of a line doesn't match
up with a protocol line, assuming the protocol even deals with lines.

When you read the HTTP RFCs, you don't see much talk of lines. You
have messages, header fields (oh, yeah a request line). But each of
these could extend over multiple lines according to [gets].

Assume for a minute that HTTP is a line-oriented protocol. Why is the
message body defined as *<OCTET>?

Anyway, I'm not trying to convince you. If you don't understand the
issues involved in programming an HTTP application so that it avoids
well known security and data preserving problems, I can't teach you.

If you think [gets] works for you, who am I to criticize. But
promoting Tcl's [gets] as some kind of miracle magic line interpreting
primitive is total bullshit. It is much less than a miracle, but
something closer to a disaster. The only thing saving it from total
disgrace is Tcl's buffer management. While C's [gets] will write over
whatever it is asked to, Tcl's [gets] can only consume all available
memory, similar to fgets().

And once you have a line from [gets] you are still stuck with the job
of parsing/interpreting the line. But now you have copied data into
memory only to then examine it by some algorithm and copy parts of the
data into new memory consuming structures.

I have not investigated the comparative efficiency of parsing buffered
channel data and a string. I know that the parsing code is much
simpler using a channel buffer, but what about the speed?

Anyway, I'm just interested in writing simple code. Using [gets] seems
like taking a random slice of data and doing some fix-ups prior to
moving on to the real algorithm.

To each his own, I guess. But nobody has ever accused me of writing
slow code. My experimental htclient is faster than http::get_url and
has additional useful features (like parallel download).

Unless you actually work on and develop client/server protocol code,
most or all of this might seem like a pointless discussion. Something
which works 99% of the time is pretty good. Sometimes the standards
need to be a little bit higher and the level of discrimination between
alternative algorithms a little more precise.

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: lreplace behaviour change in tcl 8
Next: Expect/TCL Configuration Issue - Form POST submit not working