From: MitchellWmA on
This is an OT msg. After over 2 hours I can't figure out why Agent
Ransack isn't finding a string of text where I used regex. I even dl
a couple of freeware regex testers and both don't find the text string
either. Yet other very similar situation do work. There is nothing I
can see that is any different between working examples and this one.

I was going to start working on a quotes word list when I decided to
use Agent Ransack to test the quote. The URL I found this text on had
non-canon quotes as well as actual ones from a television series.
Since I have transcripts of the series, should have been an easy thing
to find if it were truly from this series.

The quote is:
"We have no joy on the burn."

Below is the portion of the transcript where this is clearly found and
which I ultimately had to search for manually:

"SAM:
Preliminary data coming in. (Screen shows the failure) (Sam puts her
head down, Davis stares out into space) Digger 1, this is flight.
We have no joy on the burn. I'm sorry, Colonel, but the missles just
didn't have enough thrust. (General Hammond lookis at Sam.) ..."

I've used many different ways to find words, as per formats found on
the web, but nothing is coming up as it should. I narrowed it down
just to find "joy on the burn" using this:

^j[a-z].*n$

AR returns all sorts of results, like this from various episodes:

39 JOE: Maybe we should get a second opinion
578 Johnson: Wonder how that thing in your gut would like his neck
594 Johnson's arm behind him and shoves the lieutenant face down
13 Jack: So we�re talking about a rescue mission
39 Jack: It�s been done before. It can be done again
750 Jack: I�m gonna pass out again
24 Jennifer Copping - Mallen

All this leads me to believe that I still don't have the right regexp
down for words and strings. Yes, I could have just searched for "joy
on the burn", and I probably will now do that in future, but I'm
extremely disappointed that the very first time I go to seriously use
this Agent Ransack and I run into trouble where it's finding all sorts
of things but what's there.

I'm hoping it's just that I've got the wrong format wrong, so thought
I'd come here. What regexp would anyone here use to find this string,
"joy on the burn" please? thanks
From: »Q« on
In <news:jtdd13daieqqa4smr9lo4mvm7h7k01r5hq(a)4ax.com>,
MitchellWmA <nospammail(a)nonsense.com> wrote:

> I narrowed it down just to find "joy on the burn" using this:
>
> ^j[a-z].*n$

That would find "joy on the burn" if and only if the string were on a
line all by itself. ^ matches the start of a line, and $ matches the
end of a line.

> 39 JOE: Maybe we should get a second opinion
> 578 Johnson: Wonder how that thing in your gut would like his neck
> 594 Johnson's arm behind him and shoves the lieutenant face down
> 13 Jack: So we're talking about a rescue mission
> 39 Jack: It's been done before. It can be done again
> 750 Jack: I'm gonna pass out again
> 24 Jennifer Copping - Mallen

These lines all start with j, followed by a letter, followed by a lot
of characters, and end with an n.

Except for 578, which ends with a k. Unless you've rewrapped the
text, I can't explain why that turned up as a match.

> What regexp would anyone here use to find this string,
> "joy on the burn" please?

I'd just use "joy on the burn". If you know exactly what the string
you're looking for is, just using that literal string will work unless
it contains any special characters.

--
»Q«
From: MitchellWmA on
On Fri, 6 Apr 2007 18:13:41 -0500, �Q� <boxcars(a)gmx.net> wrote:

>In <news:jtdd13daieqqa4smr9lo4mvm7h7k01r5hq(a)4ax.com>,
>MitchellWmA <nospammail(a)nonsense.com> wrote:
>
>> I narrowed it down just to find "joy on the burn" using this:
>>
>> ^j[a-z].*n$
>
>That would find "joy on the burn" if and only if the string were on a
>line all by itself. ^ matches the start of a line, and $ matches the
>end of a line.

I see. I never even thought of that. Very obvious once that's
pointed out. Everywhere online I looked, nothing really covered that
^ is the beginning of a line. It says beginning of a word in the
prompts and help online very frequently. Also, other regex said to
use "<" as the beginning of word but I'm guessing it must actually be
be beginning of line since it didn't in AR either.

Thanks. Good to know this.

>> 39 JOE: Maybe we should get a second opinion
>> 578 Johnson: Wonder how that thing in your gut would like his neck
>> 594 Johnson's arm behind him and shoves the lieutenant face down
>> 13 Jack: So we�re talking about a rescue mission
>> 39 Jack: It�s been done before. It can be done again
>> 750 Jack: I�m gonna pass out again
>> 24 Jennifer Copping - Mallen
>
>These lines all start with j, followed by a letter, followed by a lot
>of characters, and end with an n.
>
>Except for 578, which ends with a k. Unless you've rewrapped the
>text, I can't explain why that turned up as a match.

Hmmm, you're right. I never even noticed that. Agent Ransack has a
cool "save to clipboard" feature and the above was a straight dump
from that. Don't think I accidentally deleted something while
pasting, either. Puzzle. Anyway, good thing is that I'd only take
result I needed so that's not so bad. I'll keep an eye on this in
future, though.

>> What regexp would anyone here use to find this string,
>> "joy on the burn" please?
>
>I'd just use "joy on the burn". If you know exactly what the string
>you're looking for is, just using that literal string will work unless
>it contains any special characters.

In this case, yes. I was just working with Agent Ransack and wasn't
thinking of an ordinary search. Then when this kept happening even
after I found the text string doing a manual search, by then I was too
caught up in trying to figure out what I was doing wrong *g*.

I now know for sure that I still have to figure out what marks the
beginning of a word so the exercise was not lost. What I save in the
future will offset the time I've spent so far. And my work today has
gone so quickly because the above parameters work on the word lists
anyway, since the words are thankfully on individual lines.

But now that I know what to look for, I'll google through all the
regex stuff again online to see if something, somehwere doesn't point
what really shows the beginning of a word. Agent Ransack just doesn't
seem to like "\b" which is only other "beginning of word" thing to use
and I must be missing out on a lot of words embedded in the info files
vs the word lists. thanx :)
From: »Q« on
In <news:klqf13tsc90dk1kd1pj712ude1aphd3grn(a)4ax.com>,
MitchellWmA <nospammail(a)nonsense.com> wrote:

> Agent Ransack just doesn't seem to like "\b" which is only other
> "beginning of word" thing to use and I must be missing out on a lot
> of words embedded in the info files vs the word lists. thanx :)

I'm not running Windows now, but I'll try to remember to have a look at
Agent Ransack when I get a chance. If \b won't work, it's not using
PCRE (Perl-compatible regular expressions, something of a standard). I
guess MythicSoft has their own regex engine, but there must be some way
to match word boundaries with it.

--
�Q�
From: »Q« on
In <news:klqf13tsc90dk1kd1pj712ude1aphd3grn(a)4ax.com>,
MitchellWmA <nospammail(a)nonsense.com> wrote:

> But now that I know what to look for, I'll google through all the
> regex stuff again online to see if something, somehwere doesn't point
> what really shows the beginning of a word. Agent Ransack just doesn't
> seem to like "\b" which is only other "beginning of word" thing to use
> and I must be missing out on a lot of words embedded in the info files
> vs the word lists.

AFAICT, there is no good way to make Agent Ransack's regex recognize
word boundaries. It seems to be very limited, with none of the special
expressions of most regex schemes.

Here's a rather ugly workaround, in case you want to stick with Agent
Ransack; for me it would be too annoying, and I'd look for some other
app with better regex support. (I like and use AR, but mostly just for
searching filenames without much need for regular expressions.) You
can define a character class consisting of all the characters that
might indicate word boundaries, or at least the more common ones, like
whitespaces and punctuation. I don't see any way at all to match tab
characters, though.

(^|[ .;:,'"])foo([ .;:,'"]|$)

will match

bar foo bar
foo bar
bar foo. Bar

but not

bar foomatic bar

--
�Q�
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8
Prev: Freeware needlepoint design software?
Next: protect..