Prev: Finding File Extension from Full Filename (VBA) (2007)
Next: Help with this error "PasteSpecial method of Range class failed !!
From: ker_01 on 2 Jun 2010 15:35 There has to be an easier way that the one I'm currently working on... I'm writing a UDF to help a colleague parse some key data out of some inconsistently formatted raw data strings. One seach will be for machine part numbers in the raw data string. All part numbers are in the format Alpha-Alpha-Number-Number-Number, as shown in the following string: "I pulled the engine apart and found that part MX452 was ok, but MV119 was worn and needed to be replaced" I've made the UDF with an input string parameter that must only contain the characters "#" and "A", for example, "##AAA" for the above part numbers. After performing it's VBA magic, the UDF should then return "MX452, MV119" The actual pattern may differ for different searches; for example, searches might be for an employee ID (#AAA), a tool ID (#######), a test number (AAA#), or other (sub)strings. There may be a few where there will be a non-alphanumeric character in a specific position (###-#A#). One option is Regex, which I've used for fixed search strings, but my brain hurts thinking about how to code Regex to dynamically deal with a user-input string. The other option (and what I've started) is to look at every substring in the source string, and then compare each substring to the pattern to see if it matches. I don't think this is very efficient, and given that there will be thousands of records, it will also probably be painfully slow. Example, based on the above string: Comparison string is 5 characters, so check each 5-character substring ("I pul", " pull", "pulle", "ulled", lled ", etc...) I've looked at the "like" statement, and that seems to be a good lead. I can use the user-input string to create something like: [!0-9][!0-9][!A-Z][!A-Z][!A-Z] And maybe use that to match each substring ("I pul", " pull", "pulle", "ulled", lled ", etc...) but that still seems inefficient. However, that seems to be the only way to know where the match is (to actually return it), and to see if there is more than one match in the raw data string. Are there any easier approaches to finding (and pulling) every substring that matches a user-designated pattern? Thanks, Keith
From: Ron Rosenfeld on 2 Jun 2010 19:39
On Wed, 2 Jun 2010 12:35:07 -0700, ker_01 <ker01(a)discussions.microsoft.com> wrote: >There has to be an easier way that the one I'm currently working on... > >I'm writing a UDF to help a colleague parse some key data out of some >inconsistently formatted raw data strings. One seach will be for machine part >numbers in the raw data string. All part numbers are in the format >Alpha-Alpha-Number-Number-Number, as shown in the following string: >"I pulled the engine apart and found that part MX452 was ok, but MV119 was >worn and needed to be replaced" > >I've made the UDF with an input string parameter that must only contain the >characters "#" and "A", for example, "##AAA" for the above part numbers. >After performing it's VBA magic, the UDF should then return "MX452, MV119" > >The actual pattern may differ for different searches; for example, searches >might be for an employee ID (#AAA), a tool ID (#######), a test number >(AAA#), or other (sub)strings. There may be a few where there will be a >non-alphanumeric character in a specific position (###-#A#). > >One option is Regex, which I've used for fixed search strings, but my brain >hurts thinking about how to code Regex to dynamically deal with a user-input >string. > >The other option (and what I've started) is to look at every substring in >the source string, and then compare each substring to the pattern to see if >it matches. I don't think this is very efficient, and given that there will >be thousands of records, it will also probably be painfully slow. > >Example, based on the above string: Comparison string is 5 characters, so >check each 5-character substring ("I pul", " pull", "pulle", "ulled", lled ", >etc...) > >I've looked at the "like" statement, and that seems to be a good lead. I can >use the user-input string to create something like: >[!0-9][!0-9][!A-Z][!A-Z][!A-Z] > >And maybe use that to match each substring ("I pul", " pull", "pulle", >"ulled", lled ", etc...) but that still seems inefficient. However, that >seems to be the only way to know where the match is (to actually return it), >and to see if there is more than one match in the raw data string. > >Are there any easier approaches to finding (and pulling) every substring >that matches a user-designated pattern? > >Thanks, >Keith Just use a regex to match your pattern. Use an argument in your UDF to determine which regex to use. For example: ================================= Option Explicit Function ExtrStr(s As String, ssType As Long) As Variant Dim re As Object, mc As Object, m As Object Dim sPat As String Dim sTemp() As String Dim i As Long Set re = CreateObject("vbscript.regexp") Select Case ssType Case Is = 1 sPat = "\b[A-Z]{2}\d{3}\b" Case Is = 2 sPat = "\b[A-Z]\d{3}]\bb" Case Is = 3 sPat = "\b\d{3}-[A-Z]\d[A-Z]\b" Case Else ExtrStr = CVErr(xlErrNum) Exit Function End Select re.Pattern = sPat re.Global = True re.ignorecase = True If re.test(s) = False Then ExtrStr = s Exit Function End If Set mc = re.Execute(s) ReDim sTemp(0 To mc.Count - 1) For i = 0 To mc.Count - 1 sTemp(i) = mc(i) Next i ExtrStr = Join(sTemp, ", ") End Function ========================================== |