Prev: Freeware needlepoint design software?
Next: protect..
From: MitchellWmA on 6 Apr 2007 17:37 This is an OT msg. After over 2 hours I can't figure out why Agent Ransack isn't finding a string of text where I used regex. I even dl a couple of freeware regex testers and both don't find the text string either. Yet other very similar situation do work. There is nothing I can see that is any different between working examples and this one. I was going to start working on a quotes word list when I decided to use Agent Ransack to test the quote. The URL I found this text on had non-canon quotes as well as actual ones from a television series. Since I have transcripts of the series, should have been an easy thing to find if it were truly from this series. The quote is: "We have no joy on the burn." Below is the portion of the transcript where this is clearly found and which I ultimately had to search for manually: "SAM: Preliminary data coming in. (Screen shows the failure) (Sam puts her head down, Davis stares out into space) Digger 1, this is flight. We have no joy on the burn. I'm sorry, Colonel, but the missles just didn't have enough thrust. (General Hammond lookis at Sam.) ..." I've used many different ways to find words, as per formats found on the web, but nothing is coming up as it should. I narrowed it down just to find "joy on the burn" using this: ^j[a-z].*n$ AR returns all sorts of results, like this from various episodes: 39 JOE: Maybe we should get a second opinion 578 Johnson: Wonder how that thing in your gut would like his neck 594 Johnson's arm behind him and shoves the lieutenant face down 13 Jack: So we�re talking about a rescue mission 39 Jack: It�s been done before. It can be done again 750 Jack: I�m gonna pass out again 24 Jennifer Copping - Mallen All this leads me to believe that I still don't have the right regexp down for words and strings. Yes, I could have just searched for "joy on the burn", and I probably will now do that in future, but I'm extremely disappointed that the very first time I go to seriously use this Agent Ransack and I run into trouble where it's finding all sorts of things but what's there. I'm hoping it's just that I've got the wrong format wrong, so thought I'd come here. What regexp would anyone here use to find this string, "joy on the burn" please? thanks
From: »Q« on 6 Apr 2007 19:13 In <news:jtdd13daieqqa4smr9lo4mvm7h7k01r5hq(a)4ax.com>, MitchellWmA <nospammail(a)nonsense.com> wrote: > I narrowed it down just to find "joy on the burn" using this: > > ^j[a-z].*n$ That would find "joy on the burn" if and only if the string were on a line all by itself. ^ matches the start of a line, and $ matches the end of a line. > 39 JOE: Maybe we should get a second opinion > 578 Johnson: Wonder how that thing in your gut would like his neck > 594 Johnson's arm behind him and shoves the lieutenant face down > 13 Jack: So we're talking about a rescue mission > 39 Jack: It's been done before. It can be done again > 750 Jack: I'm gonna pass out again > 24 Jennifer Copping - Mallen These lines all start with j, followed by a letter, followed by a lot of characters, and end with an n. Except for 578, which ends with a k. Unless you've rewrapped the text, I can't explain why that turned up as a match. > What regexp would anyone here use to find this string, > "joy on the burn" please? I'd just use "joy on the burn". If you know exactly what the string you're looking for is, just using that literal string will work unless it contains any special characters. -- »Q«
From: MitchellWmA on 7 Apr 2007 15:15 On Fri, 6 Apr 2007 18:13:41 -0500, �Q� <boxcars(a)gmx.net> wrote: >In <news:jtdd13daieqqa4smr9lo4mvm7h7k01r5hq(a)4ax.com>, >MitchellWmA <nospammail(a)nonsense.com> wrote: > >> I narrowed it down just to find "joy on the burn" using this: >> >> ^j[a-z].*n$ > >That would find "joy on the burn" if and only if the string were on a >line all by itself. ^ matches the start of a line, and $ matches the >end of a line. I see. I never even thought of that. Very obvious once that's pointed out. Everywhere online I looked, nothing really covered that ^ is the beginning of a line. It says beginning of a word in the prompts and help online very frequently. Also, other regex said to use "<" as the beginning of word but I'm guessing it must actually be be beginning of line since it didn't in AR either. Thanks. Good to know this. >> 39 JOE: Maybe we should get a second opinion >> 578 Johnson: Wonder how that thing in your gut would like his neck >> 594 Johnson's arm behind him and shoves the lieutenant face down >> 13 Jack: So we�re talking about a rescue mission >> 39 Jack: It�s been done before. It can be done again >> 750 Jack: I�m gonna pass out again >> 24 Jennifer Copping - Mallen > >These lines all start with j, followed by a letter, followed by a lot >of characters, and end with an n. > >Except for 578, which ends with a k. Unless you've rewrapped the >text, I can't explain why that turned up as a match. Hmmm, you're right. I never even noticed that. Agent Ransack has a cool "save to clipboard" feature and the above was a straight dump from that. Don't think I accidentally deleted something while pasting, either. Puzzle. Anyway, good thing is that I'd only take result I needed so that's not so bad. I'll keep an eye on this in future, though. >> What regexp would anyone here use to find this string, >> "joy on the burn" please? > >I'd just use "joy on the burn". If you know exactly what the string >you're looking for is, just using that literal string will work unless >it contains any special characters. In this case, yes. I was just working with Agent Ransack and wasn't thinking of an ordinary search. Then when this kept happening even after I found the text string doing a manual search, by then I was too caught up in trying to figure out what I was doing wrong *g*. I now know for sure that I still have to figure out what marks the beginning of a word so the exercise was not lost. What I save in the future will offset the time I've spent so far. And my work today has gone so quickly because the above parameters work on the word lists anyway, since the words are thankfully on individual lines. But now that I know what to look for, I'll google through all the regex stuff again online to see if something, somehwere doesn't point what really shows the beginning of a word. Agent Ransack just doesn't seem to like "\b" which is only other "beginning of word" thing to use and I must be missing out on a lot of words embedded in the info files vs the word lists. thanx :)
From: »Q« on 7 Apr 2007 16:06 In <news:klqf13tsc90dk1kd1pj712ude1aphd3grn(a)4ax.com>, MitchellWmA <nospammail(a)nonsense.com> wrote: > Agent Ransack just doesn't seem to like "\b" which is only other > "beginning of word" thing to use and I must be missing out on a lot > of words embedded in the info files vs the word lists. thanx :) I'm not running Windows now, but I'll try to remember to have a look at Agent Ransack when I get a chance. If \b won't work, it's not using PCRE (Perl-compatible regular expressions, something of a standard). I guess MythicSoft has their own regex engine, but there must be some way to match word boundaries with it. -- �Q�
From: »Q« on 8 Apr 2007 16:53
In <news:klqf13tsc90dk1kd1pj712ude1aphd3grn(a)4ax.com>, MitchellWmA <nospammail(a)nonsense.com> wrote: > But now that I know what to look for, I'll google through all the > regex stuff again online to see if something, somehwere doesn't point > what really shows the beginning of a word. Agent Ransack just doesn't > seem to like "\b" which is only other "beginning of word" thing to use > and I must be missing out on a lot of words embedded in the info files > vs the word lists. AFAICT, there is no good way to make Agent Ransack's regex recognize word boundaries. It seems to be very limited, with none of the special expressions of most regex schemes. Here's a rather ugly workaround, in case you want to stick with Agent Ransack; for me it would be too annoying, and I'd look for some other app with better regex support. (I like and use AR, but mostly just for searching filenames without much need for regular expressions.) You can define a character class consisting of all the characters that might indicate word boundaries, or at least the more common ones, like whitespaces and punctuation. I don't see any way at all to match tab characters, though. (^|[ .;:,'"])foo([ .;:,'"]|$) will match bar foo bar foo bar bar foo. Bar but not bar foomatic bar -- �Q� |