Prev: lreplace behaviour change in tcl 8
Next: Expect/TCL Configuration Issue - Form POST submit not working
From: Alexandre Ferrieux on 23 Jun 2010 02:49 On Jun 23, 1:56 am, "tom.rmadilo" <tom.rmad...(a)gmail.com> wrote: > > Then offer a similar command for parsing an nntp or http header, but > [info complete] relies on char-by-char parsing. Tom, what the heck are you babbling about ? "Relies on char-by-char parsing" can be said of just about any method at the algorithmic level. But an objective difference between methods is the granularity of the Tcl IO primitives called. In this code I showed you once again that [gets] can be used rather than [read 1], since you were asking for something using [gets]. Hence [info complete] is only called twice per line, not per char. As to "then offer...": already done so for HTTP, you just ignored it, get lost. -Alex
From: Andreas Leitgeb on 24 Jun 2010 10:10 tom.rmadilo <tom.rmadilo(a)gmail.com> wrote: > The question here is what defines a "line". In Tcl this is simple...or > at least it seems simple. > In a Tcl source file a line ends with <CR> or <LF> or <CR><LF>. Maybe, you just "forgot" to [fconfigure] the channel appropriately? > In fact, if the last non-<CR>?<LF> sequence is a \, then the following ><CR>?<LF> is removed. Of course it is only removed in the > interpretation of the source code, the next line is appended, after > removing additional whitespace to the previous line. This is (imho) some ugly legacy of Tcl that follows from the old days, where saving those few bytes already on reading it into memory was a win, as it saved some work from the interpreter. For any reasonable line-oriented protocol, you quite surely wouldn't want to mimick that particular backslash-newline-whitespace handling, and even less so at the "gets"-level. Most fortunately, [gets] and the I/O-subsystem don't do that bungle. > Of course [gets] also suffer because it only considers the current > contents of the input buffer. If the current buffer ends with <CR> or > <LF>, then [gets] will assume that a complete line is in the buffer. > If the next char is <LF> and the previous buffer ending is <CR>, is > the <LF> a new line? I think to remember that tcl will "remember" that single <CR>, and if next time it sees the <LF> and remembers the last seen <CR> it will silently discard the <LF>. (in auto-linefeed-mode only, of course) That's just the right way of dealing with it. > Anyway, basically [gets] is a non-lookahead parser. Hardly useful as a > protocol primitive. Very useful, if used correctly - namely on a correctly configured channel. Lookahead means, that it might not return the data as soon as the line is complete.
From: tom.rmadilo on 24 Jun 2010 15:42 On Jun 24, 7:10 am, Andreas Leitgeb <a...(a)gamma.logic.tuwien.ac.at> wrote: > tom.rmadilo <tom.rmad...(a)gmail.com> wrote: > > The question here is what defines a "line". In Tcl this is simple...or > > at least it seems simple. > > In a Tcl source file a line ends with <CR> or <LF> or <CR><LF>. > > Maybe, you just "forgot" to [fconfigure] the channel appropriately? > > > In fact, if the last non-<CR>?<LF> sequence is a \, then the following > ><CR>?<LF> is removed. Of course it is only removed in the > > interpretation of the source code, the next line is appended, after > > removing additional whitespace to the previous line. > > This is (imho) some ugly legacy of Tcl that follows from the old days, > where saving those few bytes already on reading it into memory was a win, > as it saved some work from the interpreter. This is exactly my point: if you used Tcl's [gets] to read Tcl source code and transform it into Tcl code, it would not do what you expect. Because even Tcl source code does not subscribe to the definition of a "line" as defined by [gets]. > For any reasonable line-oriented protocol, you quite surely wouldn't > want to mimick that particular backslash-newline-whitespace handling, > and even less so at the "gets"-level. Reasonable line-oriented protocol? No such beast exists, because nobody can agree on what ends a protocol line and how to represent the same data within a protocol line (How do you include a newline as line data in a line oriented protocol?) > Most fortunately, [gets] and the I/O-subsystem don't do that bungle. > > > Of course [gets] also suffer because it only considers the current > > contents of the input buffer. If the current buffer ends with <CR> or > > <LF>, then [gets] will assume that a complete line is in the buffer. > > If the next char is <LF> and the previous buffer ending is <CR>, is > > the <LF> a new line? > > I think to remember that tcl will "remember" that single <CR>, and if > next time it sees the <LF> and remembers the last seen <CR> it will > silently discard the <LF>. (in auto-linefeed-mode only, of course) > That's just the right way of dealing with it. > > > Anyway, basically [gets] is a non-lookahead parser. Hardly useful as a > > protocol primitive. > > Very useful, if used correctly - namely on a correctly configured channel.. > Lookahead means, that it might not return the data as soon as the line > is complete. This is impossible due to the data quoting rules used in a particular protocol. HTTP is horrible, email and nntp seem to require every line to end in <CR><LF>. Email requires that no line can contain <CR>, <LF> or <NUL>. Anyway, there is a big difference between what [gets] and any particular protocol consider as a "line". Even [gets] and [source] do not agree, why expect [gets] and some ancient protocol to agree?
From: Andreas Leitgeb on 25 Jun 2010 14:01 tom.rmadilo <tom.rmadilo(a)gmail.com> wrote: >>> <CR>?<LF> is removed. Of course it is only removed in the >>> interpretation of the source code, the next line is appended, after >>> removing additional whitespace to the previous line. >> This is (imho) some ugly legacy of Tcl that follows from the old days, >> where saving those few bytes already on reading it into memory was a win, >> as it saved some work from the interpreter. > This is exactly my point: if you used Tcl's [gets] to read Tcl source > code and transform it into Tcl code, it would not do what you expect. I've really tried to understand, what you're after, but failed. What do tcl's continuation lines have to do with how other line- protocols can be handled? It's like saying, that because C doesn't care all that much for newlines in C-source, its gets() function was inconsistent with semicolon-terminated statements, and thus incapable of handling line-protocols. I'm sure that's *not* what you meant, but I hope that giving you feedback, you'll be able to re-express it clearer. The <CR>/<LF> magic it does, is not specific to [gets]. It does in practise what almost everyone wants it to do, and those who don't like that magic, can turn it off using [fconfigure]. What protocol do you have in mind, that you think tcl's [gets] won't help you with? (http and smtp are actually perfectly well dealt with gets)
From: tom.rmadilo on 26 Jun 2010 19:29 On Jun 25, 11:01 am, Andreas Leitgeb <a...(a)gamma.logic.tuwien.ac.at> wrote: > What protocol do you have in mind, that you think tcl's [gets] > won't help you with? (http and smtp are actually perfectly well > dealt with gets) The short explanation is that a [gets] concept of a line doesn't match up with a protocol line, assuming the protocol even deals with lines. When you read the HTTP RFCs, you don't see much talk of lines. You have messages, header fields (oh, yeah a request line). But each of these could extend over multiple lines according to [gets]. Assume for a minute that HTTP is a line-oriented protocol. Why is the message body defined as *<OCTET>? Anyway, I'm not trying to convince you. If you don't understand the issues involved in programming an HTTP application so that it avoids well known security and data preserving problems, I can't teach you. If you think [gets] works for you, who am I to criticize. But promoting Tcl's [gets] as some kind of miracle magic line interpreting primitive is total bullshit. It is much less than a miracle, but something closer to a disaster. The only thing saving it from total disgrace is Tcl's buffer management. While C's [gets] will write over whatever it is asked to, Tcl's [gets] can only consume all available memory, similar to fgets(). And once you have a line from [gets] you are still stuck with the job of parsing/interpreting the line. But now you have copied data into memory only to then examine it by some algorithm and copy parts of the data into new memory consuming structures. I have not investigated the comparative efficiency of parsing buffered channel data and a string. I know that the parsing code is much simpler using a channel buffer, but what about the speed? Anyway, I'm just interested in writing simple code. Using [gets] seems like taking a random slice of data and doing some fix-ups prior to moving on to the real algorithm. To each his own, I guess. But nobody has ever accused me of writing slow code. My experimental htclient is faster than http::get_url and has additional useful features (like parallel download). Unless you actually work on and develop client/server protocol code, most or all of this might seem like a pointless discussion. Something which works 99% of the time is pretty good. Sometimes the standards need to be a little bit higher and the level of discrimination between alternative algorithms a little more precise.
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: lreplace behaviour change in tcl 8 Next: Expect/TCL Configuration Issue - Form POST submit not working |