Regular expression for this? [JavaScript]

Prev: FAQ Topic - Why doesn't the global variable "divId" always refer to the element with id="divId"? (2010-06-10)
Next: Dynamically Creating Checkboxes

From: Gabriel Gilini on 10 Jun 2010 14:15

Joe Nine wrote:
> Gabriel Gilini wrote:
>> Joe Nine wrote:
>>> stevewy(a)hotmail.com wrote:
>>>> I'm just trying to work out (if what I want is at all possible), a
>>>> regular expression that will search for and select (in a text editor
>>>> that supports regexps, like Notepad++) the word "onclick", then any
>>>> text at all, up to and including ">".
>>>>
>>>> I thought
>>>>
>>>> onclick*\>
>>>>
>>>> would work, but it doesn't.
>>>>
>>>> Basically it needs to Find the word onclick, then select all the text
>>>> up to >. Sort of like an extended search.
>>>>
>>>> The wildcard "*" symbol select "the previous token", not "all and
>>>> anything" like I am used to.
>>>>
>>>> What am I doing wrong?
>>>>
>>>> Steve
>>>
>>> I don't know the right regexp but I do notice that you're making an
>>> assumption that the onclick is always going to be last, before the >
>>> character. It might not be.
>> No, he isn't. Read his post again, he wants to match everything that
>> goes after the string "onclick" until the first appearance of ">".
>
> Yes technically that's what he said. I was reading between the lines and
> deducing that it's probably not what he wants. I suspect he wants the
> contents of the onclick string. Here's an example where he gets more
> than that.
>
> < ...onclick="something()" onmouseover="somethingelse()">
That's one way of deducting what he wants, but that's nothing but an
exercise in futility. OP didn't give enough information for us to know
exactly what he wants.

Either way, this probably don't belong do c.l.js

From: Stefan Weiss on 10 Jun 2010 14:21

On 10/06/10 19:41, Thomas 'PointedEars' Lahn wrote:
> Stefan Weiss wrote:
>> stevewy(a)hotmail.com wrote:
>>> I'm just trying to work out (if what I want is at all possible), a
>>> regular expression that will search for and select (in a text editor
>>> that supports regexps, like Notepad++) the word "onclick", then any
>>> text at all, up to and including ">".
>>
>> If the regular expressions used by Notepad++ are similar to those in
>> JavaScript, you could try
>>
>> onclick.*?> or onclick[^>]*>
>>
>> The .*? in the first variant matches anything, in a non-greedy way (as
>> little as possible).
>>
>> The [^>]* in the second variant matches any number of characters other
>> than ">".
>
> One thing that every programmer should know is that SGML-based markup
> languages like HTML, and programming languages, are usually not regular
> languages (they are of the Correct Bracket Language type: context-free but
> not regular), so they cannot be parsed with one regular expression alone

And I never said they could. Besides, it would depend on the type of
regular expression used. For example, take Perl's (?{...}) and (??{...})
constructs, which can be used to embed Perl code in regexes. Same thing
goes for the /e modifier in Perl substitutions. Voila, Turing complete
regular expressions. (yeah, I know that's cheating ;-)

The first (highest rated) comment on this page is a good indication of
what happens when you think too hard about parsing HTML with regular
expressions:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

> (a false positive for your suggestion has already been mentioned).

A false positive for what? The OP wanted to match...

| the word "onclick", then any text at all, up to and including ">".

....which is just what the proposed expressions do. Nobody said anything
about where the ">" is allowed to be. Your assumption about what exactly
the OP wanted to do may be correct, but that has nothing to do with the
regular expressions themselves.

--
stefan

From: Thomas 'PointedEars' Lahn on 10 Jun 2010 14:27

Gabriel Gilini wrote:

> I think you misunderstood me. What I tried to say is that trying to
> match everything after an onclick attribute up to the end of the opening
> tag with Regular Expressions in HTML isn't something that could be
> relied upon, as you so technically put in your reply to OP.

You need to read my explanation more carefully. It is quite possible to do
what was intended with regular expressions reliably, just not with any
flavor of regular expressions.

>>> The short answer would be: Don't.
>> Nonsense.
>
> Now you're confusing me. Do you think that what OP is trying to
> accomplish with Regular Expressions should be done or not?

I do not see why it should not be done if it is done properly. For example,
I have frequently used Java regular expressions in Eclipse, and sometimes
GNU-BREs, EREs and PCREs in shell scripts, for efficient search-and-replace,
including in HTML documents. With regard to JS/ES and the DOM, using
regular expressions is also the first step in writing an efficient
`innerHTML' replacement.

So there certainly is value in knowing how to use flavors of regular
expressions to solve the parsing problem of context-free languages.

HTH

PointedEars
--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
-- Richard Cornford, cljs, <cife6q$253$1$8300dec7(a)news.demon.co.uk> (2004)

From: Thomas 'PointedEars' Lahn on 10 Jun 2010 14:37

Stefan Weiss wrote:

> Thomas 'PointedEars' Lahn wrote:
>> Stefan Weiss wrote:
>>> stevewy(a)hotmail.com wrote:
>>>> I'm just trying to work out (if what I want is at all possible), a
>>>> regular expression that will search for and select (in a text editor
>>>> that supports regexps, like Notepad++) the word "onclick", then any
>>>> text at all, up to and including ">".
>>>
>>> If the regular expressions used by Notepad++ are similar to those in
>>> JavaScript, you could try
>>>
>>> onclick.*?> or onclick[^>]*>
>>>
>>> The .*? in the first variant matches anything, in a non-greedy way (as
>>> little as possible).
>>>
>>> The [^>]* in the second variant matches any number of characters other
>>> than ">".
>>
>> One thing that every programmer should know is that SGML-based markup
>> languages like HTML, and programming languages, are usually not regular
>> languages (they are of the Correct Bracket Language type: context-free
>> but not regular), so they cannot be parsed with one regular expression
>> alone
>
> And I never said they could.

You are misunderstanding my followup as an attempt at complete rebuttal of
your arguments.

> Besides, it would depend on the type of regular expression used. For
> example, take Perl's (?{...}) and (??{...}) constructs, which can be used
> to embed Perl code in regexes. Same thing goes for the /e modifier in Perl
> substitutions. Voila, Turing complete regular expressions. (yeah, I know
> that's cheating ;-)

(?R…) suffices with PCRE, BTW: <news:2746085.NajkDbe18p(a)PointedEars.de>

> The first (highest rated) comment on this page is a good indication of
> what happens when you think too hard about parsing HTML with regular
> expressions:
> http://stackoverflow.com/questions/1732348/regex-match-open-tags-except
> xhtml-self-contained-tags

Yes, cluelessness is a widespread disease, and especially common at
stackoverflow. You can parse HTML with regular expressions, just not
with a (non-PCRE) regular expression alone.

>> (a false positive for your suggestion has already been mentioned).
>
> A false positive for what? The OP wanted to match...
>
> | the word "onclick", then any text at all, up to and including ">".
>
> ...which is just what the proposed expressions do. [...]

No, think again.

PointedEars
--
realism: HTML 4.01 Strict
evangelism: XHTML 1.0 Strict
madness: XHTML 1.1 as application/xhtml+xml
-- Bjoern Hoehrmann

From: Stefan Weiss on 10 Jun 2010 14:57

On 10/06/10 20:37, Thomas 'PointedEars' Lahn wrote:
> Stefan Weiss wrote:
>> Thomas 'PointedEars' Lahn wrote:
>>> (a false positive for your suggestion has already been mentioned).
>>
>> A false positive for what? The OP wanted to match...
>>
>> | the word "onclick", then any text at all, up to and including ">".
>>
>> ...which is just what the proposed expressions do. [...]
>
> No, think again.

I'm curious. Do you mean that "any text at all" should exclude the empty
string as an edge case? If so, that's easily rememdied. I used * in my
examples, because that's what the OP wanted to learn about (off-topic or
not). Other than that, you'll have to be more specific than "no, think
again".

--
stefan

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: FAQ Topic - Why doesn't the global variable "divId" always refer to the element with id="divId"? (2010-06-10)
Next: Dynamically Creating Checkboxes