From: Evertjan. on
mayayana wrote on 08 mrt 2010 in microsoft.public.scripting.vbscript:

>
>> > That wasn't James. I said that. See his post
>> > above for the question.
>>
>> There is no post above in usenet.
>>
>
> You don't see a post above? That doesn't mean
> it's not there. :)

It is not there per definition.

See below [and I do not mean in a follow up posting!!]

> You seem to be using XNews but also
> have a reference to Google Groups in your header.

Impossible, not in MY header.
Possibly your reader adds it?

> If you're actually reading via Google you might want
> to get a real newsreader.

Oh come off it! I have been using Luu Tran's Xnews for years and wouuld
not be seen using G-groups, except sporadicly as a bad Dejavu substitute.

> If that's not the problem
> then it may be the MS server. It goes wacky periodically.

Earlier postings in usenet are not "above", quoting is "above".

> I sometimes see an Re post that appears to be the
> original myself.

Eh?

So you are the original post yourself? ;-)
And such repost is bing a mirror of yourself?

> I don't know why that happens. In
> any case, there are about 8 posts going back in this
> sub-thread and about 20 in the entire thread.

However calling them "above" indicatres somethiing else,
and at least stipulates all those postings are [stil] available on the
news server of all recipients, a stipulation that is false by usenet
standards.

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
From: James on
On Mar 9, 3:57 am, "Evertjan." <exjxw.hannivo...(a)interxnl.net> wrote:
> mayayana wrote on 08 mrt 2010 in microsoft.public.scripting.vbscript:
>
>
>
> >> >   That wasn't James. I said that. See his post
> >> > above for the question.
>
> >> There is no post above in usenet.
>
> >    You don't see a post above? That doesn't mean
> > it's not there. :)
>
> It is not there per definition.
>
> See below [and I do not mean in a follow up posting!!]
>
> >    You seem to be using XNews but also
> > have a reference to Google Groups in your header.
>
> Impossible, not in MY header.
> Possibly your reader adds it?
>
> > If you're actually reading via Google you might want
> > to get a real newsreader.
>
> Oh come off it! I have been using Luu Tran's Xnews for years and wouuld
> not be seen using G-groups, except sporadicly as a bad Dejavu substitute.
>
> > If that's not the problem
> > then it may be the MS server. It goes wacky periodically.
>
> Earlier postings in usenet are not "above", quoting is "above".
>
> > I sometimes see an Re post that appears to be the
> > original myself.
>
> Eh?
>
> So you are the original post yourself? ;-)
> And such repost is bing a mirror of yourself?
>
> > I don't know why that happens. In
> > any case, there are about 8 posts going back in this
> > sub-thread and about 20 in the entire thread.
>
> However calling them "above" indicatres somethiing else,
> and at least stipulates all those postings are [stil] available on the
> news server of all recipients, a stipulation that is false by usenet
> standards.
>
> --
> Evertjan.
> The Netherlands.
> (Please change the x'es to dots in my emailaddress)


Hi Evertjan,

In my post reffered to by mayayana, I was asking if it is possible to
perform the same operation as the replace function to remove special
characters from a string. I have written the function below which
uses the RegExp object to replace characters from the input string.
The function works, but the reason for implementing using regular
expressions is to incorporate conditions into the expression so that
some special characters remain after calling the replace method of the
Regexp object.

I am trying to remove all special characters detailed in the pattern,
if they are not a component of a url. For example, I wish to remove
all semi-colons ( : ), but not when used in a url.

I have tried a few things, including replacing ":" that are part of a
url, but when I include the not operator ( ! ), the expression doesn't
remove any characters at all (no part of the expression equates to
true??)

The following pattern matches and removes any instances of the special
characters:
"(\?|\*|\""|,|\\|<|>|&|#|~|%|{|}|\+|_|\.|@|\||:|/)"

i have tried the following and similar without success:
"(\?|\*|\""|,|\\|<|>|&|#|~|%|{|}|\+|_|\.|@|\|)|((http)!:)"

"(\?|\*|\""|,|\\|<|>|&|#|~|%|{|}|\+|_|\.|@|\||(http)!:)"

Function CleanRepReg (strtoclean)
strtemp = strtoclean


Dim objRegExp, strOutput
Set objRegExp = New Regexp

objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "(\?|\*|\""|,|\\|<|>|&|#|~|%|{|}|\+|_|\.|
@|
\||:|/)"
strOutput = objRegExp.Replace(strtemp, "-")


objRegExp.Pattern = "-+"
strOutput = objRegExp.Replace(strOutput, "-")


CleanRepReg = strOutput


End Function

Thanks

James
From: Evertjan. on
James wrote on 09 mrt 2010 in microsoft.public.scripting.vbscript:

> In my post reffered to by mayayana, I was asking if it is possible to
> perform the same operation as the replace function to remove special
> characters from a string. I have written the function below which
> uses the RegExp object to replace characters from the input string.
> The function works, but the reason for implementing using regular
> expressions is to incorporate conditions into the expression so that
> some special characters remain after calling the replace method of the
> Regexp object.
>
> I am trying to remove all special characters detailed in the pattern,
> if they are not a component of a url. For example, I wish to remove
> all semi-colons ( : ), but not when used in a url.
>
> I have tried a few things, including replacing ":" that are part of a
> url, but when I include the not operator ( ! ), the expression doesn't
> remove any characters at all (no part of the expression equates to
> true??)
>
> The following pattern matches and removes any instances of the special
> characters:
> "(\?|\*|\""|,|\\|<|>|&|#|~|%|{|}|\+|_|\.|@|\||:|/)"

> objRegExp.Pattern =
> "(\?|\*|\""|,|\\|<|>|&|#|~|%|{|}|\+|_|\.|@|\||:|/)"

Why the outer ()?
No need, unles you refer to the match in the repalce string.

> objRegExp.IgnoreCase = True

Why this, there are no litteral a-z characters in you search regex

Let's parse your sting, I see 4 errors:



\?|
\*|
\""| --> do you mean \"\"
,|
\\|
<|
>|
&|
#|
~|
%|
{| --> \{|
}| --> \}|
\+|
_|
\.|
@|
\||
:|
/ --> \/


This probably would be just as effective

objRegExp.Pattern =
"[(?*",\\<>&#~%{}+_.@|:\/]+"

> objRegExp.Pattern = "-+"
> strOutput = objRegExp.Replace(strOutput, "-")

Do you mean any number of - shoud be repalced by only one?

Then probably try:
objRegExp.Pattern = "\-+"

> i have tried the following and similar without success:
> "(\?|\*|\""|,|\\|<|>|&|#|~|%|{|}|\+|_|\.|@|\|)|((http)!:)"

What does the !: do in "((http)!:)" ?

Do you want to remove "http:" ?
Or do you mean some noncapturing match gone wrong?

try (now you probably need IgnoreCase = True) :

"[(?*",\\<>&#~%{}+_.@|:\/]|http"

================================

I prefer testing regex in javascript on Google Chrome,
sorry VBS-fans,
and did:

<script type='text/javascript'>
var x = 'http://qwerty.com/asdfg.asp_M<&#>M';
x = x.replace(/[(?*",\\<>&#~%{}+_.@|:\/]|http/gi,'-');
x = x.replace(/\-+/g,'-');
alert(x);
</script>





--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
From: James on
On Mar 9, 10:25 pm, "Evertjan." <exjxw.hannivo...(a)interxnl.net> wrote:
> James wrote on 09 mrt 2010 in microsoft.public.scripting.vbscript:
>
>
>
>
>
> > In my post reffered to by mayayana, I was asking if it is possible to
> > perform the same operation as the replace function to remove special
> > characters from a string.  I have written the function below which
> > uses the RegExp object to replace characters from the input string.
> > The function works, but the reason for implementing using regular
> > expressions is to incorporate conditions into the expression so that
> > some special characters remain after calling the replace method of the
> > Regexp object.
>
> > I am trying to remove all special characters detailed in the pattern,
> > if they are not a component of a url.  For example, I wish to remove
> > all semi-colons ( : ), but not when used in a url.
>
> > I have tried a few things, including replacing ":" that are part of a
> > url, but when I include the not operator ( ! ), the expression doesn't
> > remove any characters at all (no part of the expression equates to
> > true??)
>
> > The following pattern matches and removes any instances of the special
> > characters:
> > "(\?|\*|\""|,|\\|<|>|&|#|~|%|{|}|\+|_|\.|@|\||:|/)"
> > objRegExp.Pattern =
> > "(\?|\*|\""|,|\\|<|>|&|#|~|%|{|}|\+|_|\.|@|\||:|/)"
>
> Why the outer ()?
> No need, unles you refer to the match in the repalce string.
>
> > objRegExp.IgnoreCase = True
>
> Why this, there are no litteral a-z characters in you search regex
>
> Let's parse your sting, I see 4 errors:
>
> \?|
> \*|
> \""|  --> do you mean \"\"
> ,|
> \\|
> <|>|
>
> &|
> #|
> ~|
> %|
> {| --> \{|}| --> \}|
>
> \+|
> _|
> \.|
> @|
> \||
> :|
> / --> \/
>
> This probably would be just as effective
>
> objRegExp.Pattern =
> "[(?*",\\<>&#~%{}+_.@|:\/]+"
>
> > objRegExp.Pattern = "-+"
> > strOutput = objRegExp.Replace(strOutput, "-")
>
> Do you mean any number of - shoud be repalced by only one?
>
> Then probably try:
> objRegExp.Pattern = "\-+"
>
> > i have tried the following and similar without success:
> > "(\?|\*|\""|,|\\|<|>|&|#|~|%|{|}|\+|_|\.|@|\|)|((http)!:)"
>
> What does the !: do in "((http)!:)" ?
>
> Do you want to remove "http:" ?
> Or do you mean some noncapturing match gone wrong?
>
> try (now you probably need IgnoreCase = True) :
>
> "[(?*",\\<>&#~%{}+_.@|:\/]|http"
>
> ================================
>
> I prefer testing regex in javascript on Google Chrome,
> sorry VBS-fans,
> and did:
>
> <script type='text/javascript'>
>   var x = 'http://qwerty.com/asdfg.asp_M<&#>M';
>   x = x.replace(/[(?*",\\<>&#~%{}+_.@|:\/]|http/gi,'-');
>   x = x.replace(/\-+/g,'-');
>   alert(x);
> </script>
>
> --
> Evertjan.
> The Netherlands.
> (Please change the x'es to dots in my emailaddress)- Hide quoted text -
>
> - Show quoted text -

Thanks Evertjan,

What i am working towards, is to be able to parse text which may
contain html tags and urls explicitly in the text to remove special
characters and reformat hyperlinks. Any web addresses found in the
text should remain, even though the url contains some of the
characters being removed. This is what I was trying to achieve using
the regular expression (remove all : and / chars except from urls).

A tags are stripped, leaving the href value in brackets with padded
spaces after the anchor text of the original link. At this point, I
am trying to have all hyperlinks padded with spaces so they can be
easily identified later in the script.

I should also note that i have adjusted the expression to remove the
characters instead of replacing with a "-". It is currently removing
all characters specified in the expression even when a component of a
url. If possible, the expression needs to match any of the characters
specified, but not when part of a url.

The following demonstrates what I am trying to achive:

"text. text, tex*t text: http://www.google.com text text" should
become "text text text text http://www.google.com text text"

Thanks

James


From: Evertjan. on
James wrote on 10 mrt 2010 in microsoft.public.scripting.vbscript:

> What i am working towards, is to be able to parse text which may
> contain html tags and urls explicitly in the text to remove special
> characters and reformat hyperlinks.

I was just showing hw to use regex, not how make your specification work,
as it is much more fun and educative to try it yourself.

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)