From: Mark Morgan on 23 Dec 2009 04:18 Thank you again, everyone! Proceeding with the code laid out by Donal Fellows. And it produced a very nice file, so now I'm running into a few extra stinkers that I hadn't noticed before. Apparently, I have to deal with a few other varieties of input with this program. It's not catching the "@INPUTFILE" string when... #1 ... there's more than one space between "@INPUTFILE" and the name of the textfile. #2 ... the filename has a space (or spaces) in it. #3 ... the filename has a dash (or plus sign or other allowed character) in it. #4 ... the filename has backslashes in it (like a long path) #5 ... combinations of #1 to #4 Basically in running this and going through the output on a few files, I've learned that people are naming their files whatever is allowable by Windows. I think in my earlier post in another thread I had indicated that there was some limitation to what we'd be looking for. Could anyone help me with this? I think the regular expression is going to become much more interesting! I'm sorry for being such a pain. Sorry for not looking at the possible file names beforehand... I really am thrilled to finally have a glimmer of light at the end of the tunnel. Thank you everyone! Mark
From: Jonathan Bromley on 23 Dec 2009 07:20 On Wed, 23 Dec 2009 01:18:28 -0800 (PST), Mark Morgan wrote: >Apparently, I have to deal with a few other varieties of input with >this program. That happens.... > >It's not catching the "@INPUTFILE" string when... >#1 ... there's more than one space between "@INPUTFILE" and the name >of the textfile. > >#2 ... the filename has a space (or spaces) in it. > >#3 ... the filename has a dash (or plus sign or other allowed >character) in it. > >#4 ... the filename has backslashes in it (like a long path) > >#5 ... combinations of #1 to #4 > >Basically in running this and going through the output on a few files, >I've learned that people are naming their files whatever is allowable >by Windows. I think in my earlier post in another thread I had >indicated that there was some limitation to what we'd be looking for. > >Could anyone help me with this? I think the regular expression is >going to become much more interesting! I suspect that's fairly easy. If I can make the assumption that a filename may not contain a right curly brace }, this regular expr should do it: {\s*?\@INPUTFILE\s*?(\S.*?)\s*?} One step at a time (view in monospaced font): { - opening brace \s*? - optional space before @ @INPUTFILE - the keyword \s+? - any whitespace after keyword ( ) - parens to capture the filename (same as before, right?) \S - first char must be non-space .*? - any other garbage in filename \s*? - optional trailing space, not part of the filename } - closing brace You'll note that I've put query characters after every * or + repetition operator. That's because I don't want the .* filename match to capture a bazillion characters up to the final } in the file - it's known as "lazy" matching and forces .* (etc) to match the shortest possible acceptable string. There are other ways to get the same effect, but lazy matching conveys the sense quite nicely here. Tcl regexps have complicated rules about mixing lazy and greedy in the same expression, but in this case it's OK to use lazy everywhere, which keeps things simple. In fact you then don't need to apply the ? lazy modifier eeverywhere, but I usually do so as a reminder to myself. >I'm sorry for being such a pain. Sorry for not looking at the >possible file names beforehand... As you can imagine, I myself have absolutely no recollection of ever starting to code without a truly complete understanding of the requirements......... yeah, right :-) -- Jonathan Bromley
From: Jonathan Bromley on 23 Dec 2009 09:25 On Wed, 23 Dec 2009 13:20:51 +0100, Jonathan Bromley wrote: > If I can make the assumption that a > filename may not contain a right curly brace } So then I go and try it, and discover that a Windows filename CAN contain a right curly brace... arrrgh. So how are you supposed to parse {@INPUTFILE stupid}.txt} ??? Is there, for example, a guarantee that the directive stands on a line of its own, and its closing brace is the last non-space character on the line? In the absence of such a rule, I have no idea how one would be supposed to handle arbitrary filenames. Bah, humbug. -- Jonathan Bromley
From: Mark Morgan on 24 Dec 2009 04:29 Thank you very much! I hate to bother you, but I'm having a little bit of trouble with it with the new changes. The line in my code is: regsub -all {\s*?\@INPUTFILE\s*?(\S.*?)\s*?} $s {[readFromFile \1]} s I think the expression seems to be returning only the first letter of the the filename, though. There was a discrepancy between the stepwise description and the first version (just a plus sign), but that really had the same effect (just finding the first letter of the filename) So if I've got "{@INPUTFILE staffpnp.txt}" it would just look for "s" Is the a function of the lazy search, my data, or could it be something to do with my tcl version? I'm using tcl 8.3.1. Thank you for any help! Merry Christmas, Mark
From: Donal K. Fellows on 24 Dec 2009 09:17 On 24/12/2009 09:29, Mark Morgan wrote: > The line in my code is: > > regsub -all {\s*?\@INPUTFILE\s*?(\S.*?)\s*?} $s {[readFromFile > \1]} s [...] > So if I've got "{@INPUTFILE staffpnp.txt}" it would just look for "s" > > Is the a function of the lazy search, my data, or could it be > something to do with my tcl version? I'm using tcl 8.3.1. It's because you've not told it to explicitly match the “{}” around the rest of the substitution term. Changing the regular expression invocation to this: regsub -all {\{\s*?@INPUTFILE\s*?(\S.*?)\s*?\}} $s \ {[readFromFile \1]} s Should give satisfaction. Unless you've got one of those odd Windows filenames with brace characters in, but you probably don't want anything to do with those. :-) (It's indeed caused by the lazy search; “.*?” can lazily match nothing at all.) Donal.
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: Silent wrapping Next: tcl on multicore, what is the plan |