From: Al Dunbar on 5 Mar 2010 00:31 "dsoutter" <webmasterhub.net(a)gmail.com> wrote in message news:2456c1b9-460d-46dd-af7f-62620e277e83(a)l12g2000prg.googlegroups.com... > On Mar 3, 3:16 pm, "Al Dunbar" <aland...(a)hotmail.com> wrote: >> "dsoutter" <webmasterhub....(a)gmail.com> wrote in message <snip> >> Here is how I would code this function if I ever needed such a thing: <snip> >> IMHO, this has the same result but the logic is somewhat simpler. What >> benefit would I get from switching from my version to yours? >> >> /Al- Hide quoted text - >> >> - Show quoted text - > > Hi Al, the logic is simpler as you are using the replace() function to > perform the string replace, where the function provided takes the left > and right parts of a string, either side of an illegal character. A nice analysis, and exactly my point. Thanks for making it for me. > In > many cases, your method would be more suitable mainly due to the > simpler logic, especially when all instances of each character are to > be processed in the same way. True. But, as written, your function will also only process all instances of each character in the same way. My method might therefore appear to be better in all cases in which the functions, as written, could be used. If you want to compare our methods when applied to a different problem space, such as you describe here: > As the method provided parses the string character by character, you > should have greater control over the output when more complex > operations need to be performed, such as removing or replacing a > character only if it within a specific context: You cannot compare my function as written with your function as modified to solve some new problem. A better comparison would be to compare your modified function with a different function I might write to solve that problem. > Eg. replace "&" with " and " if padded with spaces or other specific > character, or with a "+" if not > "something & something else" would become "something and something > else" > "somethin&something else" would become "somethin+something else". > > Eg. replace ":" only if NOT part of a url: > > "the website is http://code-tips.com " would remain "the website is > http://code-tips.com " > "See Here: http://code-tips.com " would become "See Here > http://code-tips.com > " > > This would be achieved by either checking the previous 3-5 characters > when a ":" is found to see if it is in the context of a url or not > (http, https, ftp), or by checking the characters following the > current ":" is "//" which would indicate that the semicolon is part of > a url. There might even be other ways to perform this kind of parsing... > This functionality has not been included in the function provided, but > would be easy to implement, as the string is incrementally parsed and > manipulated using a numeric string position value relative to the > current position/character in the string. You seem to be proposing that simple functions be written in such a way that they are more directly adaptable into more complex ones capable of more complex operations. I disagree with this approach, UNLESS a function is coded in such a way that it can be made to perform the more complex work without first having to be modified to do so by calling it in a different manner. I'm not saying that you are wrong to do it your way, just that it may not be the best approach for others to emulate. > There may also be differences in performance between the two methods, > as the function provided includes the code required to remove or > replace each of the specified characters without calling the replace() > function. Yes, you avoid calling replace. But you do that by calling instr for each possible bad character, plus left, mid, len, and and two string concatenations for each bad character actually present. If you are concerned with the overhead of calling a built-in function, my method does that fewer times. > I suspect suspect, but do not know... > that the replace function uses a similar approach > to replace the specified characters so any difference in performance > would be minimal, unless parsing a large string value. I haven't yet > tested this for performance differences. I haven't tested either, however, the actual logic used by a built-in function, while possibly logically identical to that of a function written in vbscript, is more likely to be faster and more efficient. This is mainly because the built-in functions are coded in a lower level language. Regardless, no argument over ultimate relative efficiency can really be resolved without rigorous testing. Since neither of us feel it important enough to do that, we probably both are willing to accept some inefficiencies, given that our functions each perform their intended tasks perfectly! ;-) Or do they? I haven't tested your code, but my reading of it suggests to me that it make unstated assumptions about the nature of the string it is processing (does it, for example, presume that the string represents a valid NTFS, UNC or URL path of some sort?). If you wouldn't mind, try running your function against a string such as "C::\". I suspect the result might be "C :\", a string containing an illegal character. If so, you would have to either include an internal recursive call, or call your function in a loop until the result no longer changed. Or you would have to qualify your documentation to explain that it is intended only to process valid paths strings (or whatever the case actually is). Regardless, another knock against your function as posted, if you are interested in objective criticism, is that it does not fully document itself. The nature of an "illegal character" is somewhat inferred, but not fully explained. If the goal is to convert a valid path to a string that could be used as a filename, here are a few quirks you appear not to have addressed: non-uniqueness: Run your function (or mine, for that matter) on these two different paths: "C:\documents and settings" and "C:\documents\and\settings", and you get the same result: "C documents and settings". other filename invalidities: run it on one of those huge URL strings and you might wind up with a filename that was actually too long for the file system to handle. the concept of adapting the function to do more comprehensive processing. If that actually was the reason for your less simple approach, your audience is not getting the benefit if you do not explain that. the vagueness of the name of the function itself: clean? there's nothing dirty here. Calling it Path2Filename might be a more accurate representation of its purpose (or it might not - I could not tell the purpose from the code itself without your additional explanation. /Al
From: Al Dunbar on 5 Mar 2010 00:33 "WebmasterHub.net" <webmasterhub.net(a)gmail.com> wrote in message news:923bab94-9163-4786-b9f3-c3f283a97ff2(a)l24g2000prh.googlegroups.com... > On Mar 3, 3:16 pm, "Al Dunbar" <aland...(a)hotmail.com> wrote: >> "dsoutter" <webmasterhub....(a)gmail.com> wrote in message <snip> > Hi Al, the logic is simpler as you are using the replace() function > to > perform the string replace, where the function provided takes the > left > and right parts of a string, either side of an illegal character. In > many cases, your method would be more suitable mainly due to the > simpler logic, especially when all instances of each character are to > be processed in the same way. > > As the method provided parses the string character by character, you > should have greater control over the output when more complex > operations need to be performed, such as removing or replacing a > character only if it within a specific context: > > > Eg. replace "&" with " and " if padded with spaces or other specific > character, or with a "+" if not > "something & something else" would become "something and something > else" > "somethin&something else" would become "somethin+something else". > > > Eg. replace ":" only if NOT part of a url: > > > "the website is http://code-tips.com " would remain "the website is > http://code-tips.com " > "See Here: http://code-tips.com " would become "See Here > http://code-tips.com > " > > > This would be achieved by either checking the previous 3-5 characters > when a ":" is found to see if it is in the context of a url or not > (http, https, ftp), or by checking the characters following the > current ":" is "//" which would indicate that the semicolon is part > of > a url. > > > This functionality has not been included in the function provided, > but > would be easy to implement, as the string is incrementally parsed and > manipulated using a numeric string position value relative to the > current position/character in the string. > > There may also be differences in performance between the two methods, > as the function provided includes the code required to remove or > replace each of the specified characters without calling the > replace() > function. I suspect that the replace function uses a similar > approach > to replace the specified characters so any difference in performance > would be minimal, unless parsing a large string value. I haven't yet > tested this for performance differences. I already replied to your identical post from your alter ego ;-) /Al
From: Al Dunbar on 5 Mar 2010 00:45 "mayayana" <mayayana(a)nospam.invalid> wrote in message news:eut$Rs6uKHA.732(a)TK2MSFTNGP06.phx.gbl... > This looks like some kind of advertisement > for a blog, or of a web site purporting to demonstrate some level of expertise and authority that some of us have yet to recognize as such... > but it's an interesting question. > In compiled VB both of the foregoing methods > would be extremely slow on large strings. Granted. But if limited to URL's, for example, they might not be extremely huge. > The webpage sample is allocating a vast > number of strings to do its job. As the strings > get bigger it would slow to a crawl. The Replace > function looks much better to me, but it's also > fairly slow. (Replace itself is slow.) I do not dispute that, although I do not know the actual metrics. But for a site dedicated to providing example vbscripts and a newsgroup dedicated to the same language, a completely different approach (i.e. re-write in C, for example) would generally be of no interest to those looking for vbscript solutions. > Probably none of that matters if the function > is only being used for filename strings of 20+- > characters. And it's not easy to optimize for > speed in VBS anyway. Exactly. > But personally I'd still much > prefer your Replace loop. I don't see the sense of > writing a highly inefficient Replace method in > VBS when the scripting runtime can do it internally. Agreed. But the other issue with less simple code that cannot be discounted is the greater effort required to develop it, debug it, and test it to ensure it works in all cases. > But in general, why not tokenize? In compiled > code that should be by far the fastest, with much > greater speed achieved if the characters can be > treated as numbers in an array so that the operation > is not allocating new strings or deciphering the Chr > value of each stored numeric value of the string. > In VBS, I don't know whether treating characters as > numbers will help, since it's still a variant that has > to be "parsed". I haven't tested the possibilities. I strongly suspect that the variant thing will make most vbscript code less efficient than a compiled language, and that it might cause the tokenized approach to be less efficient than it might be expected to be. <snip> /Al
From: Al Dunbar on 5 Mar 2010 00:54 "James" <webmasterhub.net(a)gmail.com> wrote in message news:f4d5de01-3c8f-430a-8c8c-1fcbd78aa5df(a)t9g2000prh.googlegroups.com... > On Mar 5, 1:59 am, "mayayana" <mayay...(a)nospam.invalid> wrote: >> This looks like some kind of advertisement >> for a blog, but it's an interesting question. <snip> > Hi Mayayana, > > As the "air code" sample of your method parses the string character by > character, I suspect theat a combination of your method and the > function provided should allow characters to be replaced, taking into > account the context of each illegal character. > > I am using the method to clean a plain text string that may or may not > contain URLs. If there are URLs present in the string, they are later > replaced with an internal url with paramaters pointing to a logging > script that loggs and forwards the request to the original url. The > cleaned string is also used to generate a set of keywords and > keyphrases from the text supplied. You see, that whole description is not inherent in the listing you have posted of your clean function. > I have based the code below from the "air code" demo, which has also > not been tested. I have incorporated the contextual tests to only > remove/replace some characters if they are not in a scpecific context > (using a URL as an example). > > The method below must certainly be a better approach to the function > linked from this thread, or suggested by Al. It might indeed be better, but I don't see where this must certainly be so. Your original function and my "simpler" version never even tried to do the contextual bit, so saying code that was designed to do so is better is a bit like saying a hammer is a better tool than a nailfile for nailing things together. > What do you think? Also, > is there a better way to incorporate the contextual tests for each > illegal character the string? My guess: yes, probably there is. I just find your code below even harder to follow than the original clean function. But as implied previously, it seems odd to have two functions doing two different things but having the same name. /Al > Thanks > > James > > ------------------------- > > Function Clean(sIn) > Dim i2, iChar, A1() > > ReDim A1(len(sIn) - 1) > For i2 = 1 to Len(sIn) > iChar = Asc(Mid(sIn, i2, 1)) > Select Case iChar > Case 58 > rChars = Mid(sIn, i2+1, 2) > If rChars = "//" Then > A1(i2 - 1) = Chr(iChar) > End If > > Case 47 > rChar = Asc(Mid(sIn, i2+1, 1)) > lChar = Asc(Mid(sIn, i2-1, 1)) > > If rChar = 47 OR lChar = 47 Then > A1(i2 - 1) = Chr(iChar) > Else > A1(i2 - 1) = "-" > End If > > Case 63, 92, 42, 60, 62 > A1(i2 - 1) = "-" > > Case 44, 46, 43, 126 > A1(i2 - 1) = "" > > Case Else > A1(i2 - 1) = Chr(iChar) > End Select > Next > Clean = Join(A1, "") > End Function
From: Al Dunbar on 5 Mar 2010 00:56
"mayayana" <mayayana(a)nospam.invalid> wrote in message news:OCgsS8AvKHA.4220(a)TK2MSFTNGP05.phx.gbl... >> > The method below must certainly be a better approach to the function > linked from this thread, or suggested by Al. What do you think? Also, > is there a better way to incorporate the contextual tests for each > illegal character the string? >> > > I think that's pretty much what I meant in saying > it's flexible. There's no limit, really. One could even > call separate functions from within the Select Case. > > Parsing URLs > sounds tricky, but it can be done. For instance, you > could check each ":" to see if it's part of "http://", > then get the whole URL and write your edited > URL to the array. You'd just have to find the end > of the URL, calculate the offset of the start and end > characters, and keep track of how many characters > you've actually written to the array. With edits involved > you might need to use a bigger array and then Redim > Preserve it at the end before the Join call. in my opinion, the use of regular expressions seems more likely to be more efficient than coding all the ifs ands and buts in vbscript. But sorry, I'm not a regular expression kind of guy. /Al > ------------------------- > > Function Clean(sIn) > Dim i2, iChar, A1() > > ReDim A1(len(sIn) - 1) > For i2 = 1 to Len(sIn) > iChar = Asc(Mid(sIn, i2, 1)) > Select Case iChar > Case 58 > rChars = Mid(sIn, i2+1, 2) > If rChars = "//" Then > A1(i2 - 1) = Chr(iChar) > End If > > Case 47 > rChar = Asc(Mid(sIn, i2+1, 1)) > lChar = Asc(Mid(sIn, i2-1, 1)) > > If rChar = 47 OR lChar = 47 Then > A1(i2 - 1) = Chr(iChar) > Else > A1(i2 - 1) = "-" > End If > > Case 63, 92, 42, 60, 62 > A1(i2 - 1) = "-" > > Case 44, 46, 43, 126 > A1(i2 - 1) = "" > > Case Else > A1(i2 - 1) = Chr(iChar) > End Select > Next > Clean = Join(A1, "") > End Function > > |