From: Mark Morgan on 21 Dec 2009 07:28 Hello, This is related to my previous post "another newbie question, simple string operation?" My question changed quite a bit over the last few days, so I thought I should reframe the whole thing. My basic issue is I've got a master text file that has the text of other filenames scattered in it. I'm trying to substitute the contents of the other files into the place where the filename is in the master text file. Everything is in the same directory. The filename and the surrounding text that will need to be replaced is this (quoted): "{@INPUTFILE example.txt}". The problem I'm having is that I can't get it to substitute all of that text, all of the time. Specifically if the file example.txt contains (quoted): "Example: 300-45823" then it won't replace it. If it has "{Example: 300-45823} then it will. Here's the code and I'll explain more afterwards (There's a program called Shorthand for Windows written in TCL and that's where some of the application specific verbage will be coming from.): set file [open $filename r]; set data [read $file]; close $file; foreach textspiel [regexp -all -inline {((\{@INPUTFILE)\s+\w+(.txt \}))} $data] { set intspiel [string length $textspiel] if {$intspiel > 15} { set txtposition [string first ".txt" $textspiel]; set textfilenameend [expr {$txtposition + 3}]; set textfilename [string range $textspiel 11 $textfilenameend]; set textfilenametrimmed [string trimleft $textfilename]; # check whether the file exists if {[file exists $textfilenametrimmed]} { # read in the contents of the file; add them to the map set textfile [open $textfilenametrimmed]; set contents [read $textfile]; close $textfile; regsub -all {$textspiel} $data {$contents} data; sh_input msg "" "$textspiel AND $contents"; } else { set filenotfound "{file not found}"; regsub -all $textspiel $data $filenotfound data; } } } set file [open $filename w]; puts $file $data; close $file; As you can see, I did have to bandaid a few if's in there. I'd be happy if anyone has any general suggestions on this code, too. But, I think it must have to do with the brackets. It seems like when the text file that's going to be substituted into the master file contains bracketed text then the substitution goes forward. If not, regexp finds it and sends it down through the code but regsub won't substitute it. Thanks in advance for your thoughts/suggestions. Sincerely, Mark
From: Donal K. Fellows on 21 Dec 2009 09:06 On 21 Dec, 12:28, Mark Morgan <me.mor...(a)yahoo.com> wrote: > The filename and the surrounding text that will need to be replaced is > this (quoted): "{@INPUTFILE example.txt}". The problem I'm having is > that I can't get it to substitute all of that text, all of the time. > Specifically if the file example.txt contains (quoted): "Example: > 300-45823" then it won't replace it. If it has "{Example: 300-45823} > then it will. Starting out by trying to understand the requirements here. You have a file whose contents includes sequences of the form: {@INPUTFILE FOOBAR.txt} and each of those sequences needs to be replaced by the contents of the file with the given name? Assuming that's so, and that there's no other quoting to do, then the method is this: proc processTemplate string { # This is exactly the replacement to make a string [subst]-safe set s [string map {$ \\$ \[ \\\[ \\ \\\\} $string] # Now convert the replacements to embedded commands regsub -all {{@INPUTFILE (\w+\.txt)}} $s {[readFromFile \1]} s # Process all the substitutions return [subst $s] } # Simple read-a-file helper proc readFromFile filename { set f [open $filename] set d [read $f] close $f return $d # Use this instead if you want recursive template processing: # return [processTemplate $d] } The use of [string map], [regsub] and [subst] is not as intuitive as it ought to be. There probably ought to be a -eval or -command option to [regsub] so that the rest of that stuff can be avoided, but it's not been implemented yet (it's slightly tricky to make the syntax work perfectly so that it doesn't clunk, so it's not so far had a high enough priority for the people doing the Tcl implementation to work on). One thing to note about this code. It's a lot simpler than yours. Tricks like this are why it is useful to ask here (or look on the Wiki: http://wiki.tcl.tk) when you're having problems. Donal.
From: Jonathan Bromley on 21 Dec 2009 10:26 On Mon, 21 Dec 2009 06:06:08 -0800 (PST), "Donal K. Fellows" wrote: [snip nice solution] >One thing to note about this code. It's a lot simpler than yours. >Tricks like this are why it is useful to ask here I was going to post a somewhat different solution myself but Donal got there first... but his [subst]-based solution raises some really interesting questions for me as a sometime teacher and trainer. Using [subst] on data is obviously convenient and powerful, but it has always troubled me somewhat. For example: - Unlike just about everything else in Tcl, [subst] just works the way it works and there's not much you can do to modify its behaviour. That's fine if it does exactly what you need, but I worry about flexibility. Stuff like include-file insertion is quite likely to need detailed, context-dependent intervention: for example, what should happen if the last character of an included file is (or is not) a line break? - The preparatory wardance set s [string map {$ \\$ \[ \\\[ \\ \\\\} $string] frightens me a lot. It's a piece of user code that mirrors the operation of some Tcl internals. Am I alone in finding that somewhat distasteful? And finally, although the solution is neat and instructive, its relationship to the original requirements is not obvious to anyone who is not highly Tcl-savvy. None of this is complaint or criticism. Rather, I guess, it's an open invitation to help me readjust my attitudes :-) -- Jonathan Bromley
From: Donal K. Fellows on 21 Dec 2009 10:42 On 21/12/2009 15:26, Jonathan Bromley wrote: > Using [subst] on data is obviously convenient and powerful, > but it has always troubled me somewhat. For example: It's magical. I can remember being thoroughly startled the first time I saw that sort of thing going on too. :-) But it works, and is both fast and safe. > - Unlike just about everything else in Tcl, [subst] just > works the way it works and there's not much you can do > to modify its behaviour. That's fine if it does exactly > what you need, but I worry about flexibility. Stuff like > include-file insertion is quite likely to need detailed, > context-dependent intervention: for example, what should > happen if the last character of an included file is > (or is not) a line break? Well that's entirely up to how you go about writing both the [regsub] to make the command substitution producing the string to process, and what those command substitutions do. In this case, I'm using a very simple-minded model; I'm sure you can come up with more sophisticated ones. But in summary, there are three steps: 1. Defang; [string map] makes this easy. 2. Put in the interesting substitutions. 3. Splat through [subst]. You can reduce the amount of quoting needed in step #1 by passing extra options in step #3 (e.g., I could have not quoted '$' characters if I'd passed the -novariables option to [subst]). But it's easy enough to handle all three cases. > - The preparatory wardance > set s [string map {$ \\$ \[ \\\[ \\ \\\\} $string] > frightens me a lot. It's a piece of user code that mirrors > the operation of some Tcl internals. Am I alone in finding > that somewhat distasteful? OK, that just puts a backslash in front of all Tcl's in-double-quotes metacharacters. Really. An alternative would have been: regsub -all {[[\\$]} $string {\\&} s But that's slower and just as magical. :-) > And finally, although the solution is neat and instructive, > its relationship to the original requirements is not obvious > to anyone who is not highly Tcl-savvy. We know it ought to be more elegant and obvious than this; it's on our todo list. Maybe next year in Tcl 8.7...? Donal.
From: Jonathan Bromley on 21 Dec 2009 10:56
On Mon, 21 Dec 2009 15:42:25 +0000, "Donal K. Fellows" wrote: >It's magical. I can remember being thoroughly startled the first time I >saw that sort of thing going on too. :-) But it works, and is both fast >and safe. Understood. > 1. Defang; [string map] makes this easy. > 2. Put in the interesting substitutions. > 3. Splat through [subst]. Nice summary, thanks. >You can reduce the amount of quoting needed in step #1 by passing extra >options in step #3 (e.g., I could have not quoted '$' characters if I'd >passed the -novariables option to [subst]). But it's easy enough to >handle all three cases. Right, it seems pointless to do only some of them if one single, simple recipe will handle the whole lot. >> - The preparatory wardance >> set s [string map {$ \\$ \[ \\\[ \\ \\\\} $string] >> frightens me a lot. It's a piece of user code that mirrors >> the operation of some Tcl internals. Am I alone in finding >> that somewhat distasteful? > >OK, that just puts a backslash in front of all Tcl's in-double-quotes >metacharacters. Really. Yes, I'm aware of that. But a beginner surely would have a hard time being confident that the set was complete. So it could easily degenerate into a piece of voodoo, handed down by cut'n'paste from one project to another, until its original purpose was lost.... Actually I would have thought an encapsulation of that would be a useful addition to the repertoire: [string unsubst] ?? >> And finally, although the solution is neat and instructive, >> its relationship to the original requirements is not obvious >> to anyone who is not highly Tcl-savvy. > >We know it ought to be more elegant and obvious than this; it's on our >todo list. Maybe next year in Tcl 8.7...? No, I wasn't criticising Tcl's facilities; I was questioning whether it's good, especially for beginners or occasional users, to apply techniques that are so many steps away from the original spec. Even if it's a tad inefficient, pedestrian step-by-step implementation of such requirements is sometimes a good investment for future comprehensibility. Thanks for the response. -- Jonathan Bromley |