From: Mark Morgan on
Thank you again, everyone!

Proceeding with the code laid out by Donal Fellows. And it produced a
very nice file, so now I'm running into a few extra stinkers that I
hadn't noticed before.

Apparently, I have to deal with a few other varieties of input with
this program.

It's not catching the "@INPUTFILE" string when...
#1 ... there's more than one space between "@INPUTFILE" and the name
of the textfile.

#2 ... the filename has a space (or spaces) in it.

#3 ... the filename has a dash (or plus sign or other allowed
character) in it.

#4 ... the filename has backslashes in it (like a long path)

#5 ... combinations of #1 to #4

Basically in running this and going through the output on a few files,
I've learned that people are naming their files whatever is allowable
by Windows. I think in my earlier post in another thread I had
indicated that there was some limitation to what we'd be looking for.

Could anyone help me with this? I think the regular expression is
going to become much more interesting!

I'm sorry for being such a pain. Sorry for not looking at the
possible file names beforehand... I really am thrilled to finally
have a glimmer of light at the end of the tunnel.

Thank you everyone!

Mark


From: Jonathan Bromley on
On Wed, 23 Dec 2009 01:18:28 -0800 (PST), Mark Morgan wrote:

>Apparently, I have to deal with a few other varieties of input with
>this program.

That happens....
>
>It's not catching the "@INPUTFILE" string when...
>#1 ... there's more than one space between "@INPUTFILE" and the name
>of the textfile.
>
>#2 ... the filename has a space (or spaces) in it.
>
>#3 ... the filename has a dash (or plus sign or other allowed
>character) in it.
>
>#4 ... the filename has backslashes in it (like a long path)
>
>#5 ... combinations of #1 to #4
>
>Basically in running this and going through the output on a few files,
>I've learned that people are naming their files whatever is allowable
>by Windows. I think in my earlier post in another thread I had
>indicated that there was some limitation to what we'd be looking for.
>
>Could anyone help me with this? I think the regular expression is
>going to become much more interesting!

I suspect that's fairly easy. If I can make the assumption that a
filename may not contain a right curly brace }, this regular expr
should do it:

{\s*?\@INPUTFILE\s*?(\S.*?)\s*?}

One step at a time (view in monospaced font):

{ - opening brace
\s*? - optional space before @
@INPUTFILE - the keyword
\s+? - any whitespace after keyword
( ) - parens to capture the filename
(same as before, right?)
\S - first char must be non-space
.*? - any other garbage in filename
\s*? - optional trailing space,
not part of the filename
} - closing brace

You'll note that I've put query characters after every * or +
repetition operator. That's because I don't want the .* filename
match to capture a bazillion characters up to the final } in the
file - it's known as "lazy" matching and forces .* (etc) to match
the shortest possible acceptable string. There are other ways
to get the same effect, but lazy matching conveys the sense
quite nicely here. Tcl regexps have complicated rules about
mixing lazy and greedy in the same expression, but in this case
it's OK to use lazy everywhere, which keeps things simple.
In fact you then don't need to apply the ? lazy modifier
eeverywhere, but I usually do so as a reminder to myself.

>I'm sorry for being such a pain. Sorry for not looking at the
>possible file names beforehand...

As you can imagine, I myself have absolutely no recollection
of ever starting to code without a truly complete
understanding of the requirements......... yeah, right :-)
--
Jonathan Bromley
From: Jonathan Bromley on
On Wed, 23 Dec 2009 13:20:51 +0100, Jonathan Bromley wrote:

> If I can make the assumption that a
> filename may not contain a right curly brace }

So then I go and try it, and discover that a Windows filename CAN
contain a right curly brace... arrrgh. So how are you supposed
to parse
{@INPUTFILE stupid}.txt}
???

Is there, for example, a guarantee that the directive stands
on a line of its own, and its closing brace is the last non-space
character on the line? In the absence of such a rule, I have no
idea how one would be supposed to handle arbitrary filenames.

Bah, humbug.
--
Jonathan Bromley
From: Mark Morgan on
Thank you very much!

I hate to bother you, but I'm having a little bit of trouble with it
with the new changes.

The line in my code is:

regsub -all {\s*?\@INPUTFILE\s*?(\S.*?)\s*?} $s {[readFromFile
\1]} s

I think the expression seems to be returning only the first letter of
the the filename, though.

There was a discrepancy between the stepwise description and the first
version (just a plus sign), but that really had the same effect (just
finding the first letter of the filename)

So if I've got "{@INPUTFILE staffpnp.txt}" it would just look for "s"

Is the a function of the lazy search, my data, or could it be
something to do with my tcl version? I'm using tcl 8.3.1.

Thank you for any help!

Merry Christmas,
Mark
From: Donal K. Fellows on
On 24/12/2009 09:29, Mark Morgan wrote:
> The line in my code is:
>
> regsub -all {\s*?\@INPUTFILE\s*?(\S.*?)\s*?} $s {[readFromFile
> \1]} s
[...]
> So if I've got "{@INPUTFILE staffpnp.txt}" it would just look for "s"
>
> Is the a function of the lazy search, my data, or could it be
> something to do with my tcl version? I'm using tcl 8.3.1.

It's because you've not told it to explicitly match the “{}” around the
rest of the substitution term. Changing the regular expression
invocation to this:

regsub -all {\{\s*?@INPUTFILE\s*?(\S.*?)\s*?\}} $s \
{[readFromFile \1]} s

Should give satisfaction. Unless you've got one of those odd Windows
filenames with brace characters in, but you probably don't want anything
to do with those. :-)

(It's indeed caused by the lazy search; “.*?” can lazily match nothing
at all.)

Donal.
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4
Prev: Silent wrapping
Next: tcl on multicore, what is the plan