Prev: FAQ Topic - Why doesn't the global variable "divId" always refer to the element with id="divId"? (2010-06-10)
Next: Dynamically Creating Checkboxes
From: Gabriel Gilini on 10 Jun 2010 14:15 Joe Nine wrote: > Gabriel Gilini wrote: >> Joe Nine wrote: >>> stevewy(a)hotmail.com wrote: >>>> I'm just trying to work out (if what I want is at all possible), a >>>> regular expression that will search for and select (in a text editor >>>> that supports regexps, like Notepad++) the word "onclick", then any >>>> text at all, up to and including ">". >>>> >>>> I thought >>>> >>>> onclick*\> >>>> >>>> would work, but it doesn't. >>>> >>>> Basically it needs to Find the word onclick, then select all the text >>>> up to >. Sort of like an extended search. >>>> >>>> The wildcard "*" symbol select "the previous token", not "all and >>>> anything" like I am used to. >>>> >>>> What am I doing wrong? >>>> >>>> Steve >>> >>> I don't know the right regexp but I do notice that you're making an >>> assumption that the onclick is always going to be last, before the > >>> character. It might not be. >> No, he isn't. Read his post again, he wants to match everything that >> goes after the string "onclick" until the first appearance of ">". > > Yes technically that's what he said. I was reading between the lines and > deducing that it's probably not what he wants. I suspect he wants the > contents of the onclick string. Here's an example where he gets more > than that. > > < ...onclick="something()" onmouseover="somethingelse()"> That's one way of deducting what he wants, but that's nothing but an exercise in futility. OP didn't give enough information for us to know exactly what he wants. Either way, this probably don't belong do c.l.js
From: Stefan Weiss on 10 Jun 2010 14:21 On 10/06/10 19:41, Thomas 'PointedEars' Lahn wrote: > Stefan Weiss wrote: >> stevewy(a)hotmail.com wrote: >>> I'm just trying to work out (if what I want is at all possible), a >>> regular expression that will search for and select (in a text editor >>> that supports regexps, like Notepad++) the word "onclick", then any >>> text at all, up to and including ">". >> >> If the regular expressions used by Notepad++ are similar to those in >> JavaScript, you could try >> >> onclick.*?> or onclick[^>]*> >> >> The .*? in the first variant matches anything, in a non-greedy way (as >> little as possible). >> >> The [^>]* in the second variant matches any number of characters other >> than ">". > > One thing that every programmer should know is that SGML-based markup > languages like HTML, and programming languages, are usually not regular > languages (they are of the Correct Bracket Language type: context-free but > not regular), so they cannot be parsed with one regular expression alone And I never said they could. Besides, it would depend on the type of regular expression used. For example, take Perl's (?{...}) and (??{...}) constructs, which can be used to embed Perl code in regexes. Same thing goes for the /e modifier in Perl substitutions. Voila, Turing complete regular expressions. (yeah, I know that's cheating ;-) The first (highest rated) comment on this page is a good indication of what happens when you think too hard about parsing HTML with regular expressions: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags > (a false positive for your suggestion has already been mentioned). A false positive for what? The OP wanted to match... | the word "onclick", then any text at all, up to and including ">". ....which is just what the proposed expressions do. Nobody said anything about where the ">" is allowed to be. Your assumption about what exactly the OP wanted to do may be correct, but that has nothing to do with the regular expressions themselves. -- stefan
From: Thomas 'PointedEars' Lahn on 10 Jun 2010 14:27 Gabriel Gilini wrote: > I think you misunderstood me. What I tried to say is that trying to > match everything after an onclick attribute up to the end of the opening > tag with Regular Expressions in HTML isn't something that could be > relied upon, as you so technically put in your reply to OP. You need to read my explanation more carefully. It is quite possible to do what was intended with regular expressions reliably, just not with any flavor of regular expressions. >>> The short answer would be: Don't. >> Nonsense. > > Now you're confusing me. Do you think that what OP is trying to > accomplish with Regular Expressions should be done or not? I do not see why it should not be done if it is done properly. For example, I have frequently used Java regular expressions in Eclipse, and sometimes GNU-BREs, EREs and PCREs in shell scripts, for efficient search-and-replace, including in HTML documents. With regard to JS/ES and the DOM, using regular expressions is also the first step in writing an efficient `innerHTML' replacement. So there certainly is value in knowing how to use flavors of regular expressions to solve the parsing problem of context-free languages. HTH PointedEars -- Danny Goodman's books are out of date and teach practices that are positively harmful for cross-browser scripting. -- Richard Cornford, cljs, <cife6q$253$1$8300dec7(a)news.demon.co.uk> (2004)
From: Thomas 'PointedEars' Lahn on 10 Jun 2010 14:37 Stefan Weiss wrote: > Thomas 'PointedEars' Lahn wrote: >> Stefan Weiss wrote: >>> stevewy(a)hotmail.com wrote: >>>> I'm just trying to work out (if what I want is at all possible), a >>>> regular expression that will search for and select (in a text editor >>>> that supports regexps, like Notepad++) the word "onclick", then any >>>> text at all, up to and including ">". >>> >>> If the regular expressions used by Notepad++ are similar to those in >>> JavaScript, you could try >>> >>> onclick.*?> or onclick[^>]*> >>> >>> The .*? in the first variant matches anything, in a non-greedy way (as >>> little as possible). >>> >>> The [^>]* in the second variant matches any number of characters other >>> than ">". >> >> One thing that every programmer should know is that SGML-based markup >> languages like HTML, and programming languages, are usually not regular >> languages (they are of the Correct Bracket Language type: context-free >> but not regular), so they cannot be parsed with one regular expression >> alone > > And I never said they could. You are misunderstanding my followup as an attempt at complete rebuttal of your arguments. > Besides, it would depend on the type of regular expression used. For > example, take Perl's (?{...}) and (??{...}) constructs, which can be used > to embed Perl code in regexes. Same thing goes for the /e modifier in Perl > substitutions. Voila, Turing complete regular expressions. (yeah, I know > that's cheating ;-) (?R…) suffices with PCRE, BTW: <news:2746085.NajkDbe18p(a)PointedEars.de> > The first (highest rated) comment on this page is a good indication of > what happens when you think too hard about parsing HTML with regular > expressions: > http://stackoverflow.com/questions/1732348/regex-match-open-tags-except > xhtml-self-contained-tags Yes, cluelessness is a widespread disease, and especially common at stackoverflow. You can parse HTML with regular expressions, just not with a (non-PCRE) regular expression alone. >> (a false positive for your suggestion has already been mentioned). > > A false positive for what? The OP wanted to match... > > | the word "onclick", then any text at all, up to and including ">". > > ...which is just what the proposed expressions do. [...] No, think again. PointedEars -- realism: HTML 4.01 Strict evangelism: XHTML 1.0 Strict madness: XHTML 1.1 as application/xhtml+xml -- Bjoern Hoehrmann
From: Stefan Weiss on 10 Jun 2010 14:57
On 10/06/10 20:37, Thomas 'PointedEars' Lahn wrote: > Stefan Weiss wrote: >> Thomas 'PointedEars' Lahn wrote: >>> (a false positive for your suggestion has already been mentioned). >> >> A false positive for what? The OP wanted to match... >> >> | the word "onclick", then any text at all, up to and including ">". >> >> ...which is just what the proposed expressions do. [...] > > No, think again. I'm curious. Do you mean that "any text at all" should exclude the empty string as an edge case? If so, that's easily rememdied. I used * in my examples, because that's what the OP wanted to learn about (off-topic or not). Other than that, you'll have to be more specific than "no, think again". -- stefan |