From: Jim Berglund on 24 May 2010 18:58 I want to parse a lot of data with rows like the following: A A WOODS 1479 22RD STREET CALGARY AB T3H2C4 (403) 555-9999 I would like columns for NAME ADDRESS CITY PROV POSTALCODE PHONENUMBER I can do it in Excel with formulas, but is there a nice piece of generic code that does this sort of thing? (or should I parse it based on using spaces as delimiters and then join the fields that need joining?) Jim Berglund
From: JLGWhiz on 24 May 2010 19:35 The phone number, postal code, province and city could probably be picked our pretty easily. Unless there is some consistency in the address and names grouping, it would be very difficult to pick those out in one fell swoop. Consistency would be that the address is always street number and a two part street name separated by spaces, and the person's name would always be first middle last separated by spaces. I doubt that those are entered consitently in that fashion. If the street address always begins with a number, it could be used a key to separate the name and street address, which would then make it possible to break all the elements into their respective groups. But I wouldn't want to tackle it. "Jim Berglund" <jazzzbo(a)shaw.ca> wrote in message news:%23hurWS5%23KHA.148(a)TK2MSFTNGP06.phx.gbl... >I want to parse a lot of data with rows like the following: > > A A WOODS 1479 22RD STREET CALGARY AB T3H2C4 (403) 555-9999 > > I would like columns for NAME ADDRESS CITY PROV POSTALCODE PHONENUMBER > > I can do it in Excel with formulas, but is there a nice piece of generic > code that does this sort of thing? > (or should I parse it based on using spaces as delimiters and then join > the fields that need joining?) > > Jim Berglund
From: Ron Rosenfeld on 24 May 2010 21:29 On Mon, 24 May 2010 16:58:10 -0600, "Jim Berglund" <jazzzbo(a)shaw.ca> wrote: >I want to parse a lot of data with rows like the following: > >A A WOODS 1479 22RD STREET CALGARY AB T3H2C4 (403) 555-9999 > >I would like columns for NAME ADDRESS CITY PROV POSTALCODE PHONENUMBER > >I can do it in Excel with formulas, but is there a nice piece of generic >code that does this sort of thing? >(or should I parse it based on using spaces as delimiters and then join the >fields that need joining?) > >Jim Berglund How easy or hard this might be to do depends on format variations. If you cannot set out firm rules for the data, you won't be able to do it. Probably the PROV, POSTALCODE & PHONENUMBER can be easily parsed from the end of the string. The CITY would be one or several words that precede the PROV and you could use a list of valid cities. The ADDRESS precedes the city and starts with a number. The NAME ends with the number. If all of your rows can be unambiguously described, either as I have done or some other method, then you should be able to parse either with formulas, or with a macro. Here's a sample macro with a very short list of possibly valid cities that parses rows that fit the above rules. It should give you some ideas. ================================ Option Explicit Sub ParseAddr() Dim myRegExp As Object, myMatches As Object Dim rg As Range, c As Range Dim i As Long Set rg = Selection Set myRegExp = CreateObject("vbscript.regexp") myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" _ & "\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$" For Each c In rg If myRegExp.test(c.Text) = True Then Set myMatches = myRegExp.Execute(c.Text) For i = 0 To 5 c.Offset(0, i + 1) = myMatches(0).submatches(i) Next i End If Next c End Sub ==================================== The list of cities needs to be enclosed in parentheses and be pipe delimited. They are tested in the order listed, so if you had several cities with similar names, the order you list them in can be critical. --ron
From: Jim Berglund on 25 May 2010 00:27 Ron. Thanks for the effort you put in on this. I would really like to understand this more. Is it checking each row to see if it meets the pattern, and if so, doing the parsing and then building a new, 5-column list? What is the following line for? Set myRegExp = CreateObject("vbscript.regexp") Also, could you please help me understand the following? I've never seen any code like it and I just don't have a clue how to interpret it... myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" & "\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$" I tried it and it ran without any errors. But it didn't do anything. BTW, For anyone else looking at this, I found the following entry... Formulas for parsing names. 1. Using an index: =T(INDEX($M$1:$M$6,MAX(INDEX(COUNTIF(C1,$M$2:$M$6&{". *"," *"})*{1;2;3;4;5},0))+1)) (Set up an index in M1:M5) 2. Using OR =IF(LEFT(A30,3)="Mrs","Mrs",IF(LEFT(A30,4)="Miss","Miss",IF(OR(LEFT(A30,2)="Mr",LEFT(A30,2)="Ms",LEFT(A30,2)="Dr"),LEFT(A30,2),"X"))) 3. Parsing, using VB Option Explicit Function ParseName(str As String, Index As Long) As String Dim re As Object Dim mc As Object Dim sPat As String Dim sTitle As String 'Index code ' 1 = Salutation ' 3 = First Name 'Pipe-delimited list of possible Titles sTitle = "Mr|Ms|Miss|Mrs|Dr" sPat = "^((" & sTitle & ")\.?(\s+))?(\w+)" Set re = CreateObject("vbscript.regexp") re.ignorecase = True re.Pattern = sPat If re.test(str) = True Then Set mc = re.Execute(str) ParseName = mc(0).submatches(Index) End If End Function 4. Using TRIM Parsing names in Excel Having just had to go through a process to clean up a list of names in a spreadsheet, I thought I'd share the 'magic' formula I came up with. As you can imagine, there was no pretty way to get the list into Excel, there were several columns of names and other information, but with no obvious delimiters to behold. I was amazed when I got the file, how bad it looked. I can only assume this was a redirected print file from some ancient mainframe system. It can't be that hard to get a nice comma delimited file surely!? The format of the name was either: First Last First MI Last First Middle Last Obtaining the first name was simple: =LEFT(A11,(FIND(" ",A11,1)-1)) Getting the last name was a different matter: =TRIM( IF(ISERR(FIND(" ",A11,(FIND(" ",A11,1)+1))), MID(A11,FIND(" ",A11,1)+1,(LEN(A11)-FIND(" ",A11,1))), MID(A11,FIND(" ",A11,(FIND(" ",A11,1)+1))+1,(LEN(A11)-FIND(" ",A11,FIND(" ",A11,1)))))) Basically, remove any spaces and if the FIND returns an error, then the name is just first and last names, otherwise there is a middle name or initial to deal with. If no middle name/initial then get the last name, which starts 1 char after the space otherwise get the last name after locating the second space in the string and start from there Another Method for Parsing Names Assume you have names of the following types in column C Mrs Dorothy Hannity Dr P R Rogers Dana Delany Mr Bradley K Pitts Type the following formulas into the specified cells: O1=FIND(" ",C1) Determines the location/existence of the blank following the Salutaton or First Name P1=FIND(" ",C1,FIND(" ",C1)+1) Determines the location/existence of the blank following the First Name or Middle Initial(MI) Q1=FIND(" ",R1) Determines the location/existence of the blank following the Middle Initial in the next, adjacent cell R1=IF(ISERROR(P1),RIGHT(C1,LEN(C1)-O1),RIGHT(C1,LEN(C1)-P1)) Defines the Last Name or MI/LN if there is a MI S1=LEFT(C1,O1-1) Creates the Salutation column T1=IF(ISERROR(S1)," ",MID(C1,O1+1,P1-O1)) Creates the First Name column U1=IF(FIND(" ",R1)=2,LEFT(R1,1),"") Creates the MI column V1=IF(U1<>"",MID(R1,FIND(" ",R1)+1,99),R1) Creates the Last Name column Fill the entries down and then copy the resulting values into another set of columns. Jim "Ron Rosenfeld" <ronrosenfeld(a)nospam.org> wrote in message news:1r8mv51ickslteko3es1jnkh7142228q55(a)4ax.com... > On Mon, 24 May 2010 16:58:10 -0600, "Jim Berglund" <jazzzbo(a)shaw.ca> > wrote: > >>I want to parse a lot of data with rows like the following: >> >>A A WOODS 1479 22RD STREET CALGARY AB T3H2C4 (403) 555-9999 >> >>I would like columns for NAME ADDRESS CITY PROV POSTALCODE PHONENUMBER >> >>I can do it in Excel with formulas, but is there a nice piece of generic >>code that does this sort of thing? >>(or should I parse it based on using spaces as delimiters and then join >>the >>fields that need joining?) >> >>Jim Berglund > > How easy or hard this might be to do depends on format variations. > > If you cannot set out firm rules for the data, you won't be able to do it. > > Probably the PROV, POSTALCODE & PHONENUMBER can be easily parsed from the > end > of the string. > > The CITY would be one or several words that precede the PROV and you could > use > a list of valid cities. > > The ADDRESS precedes the city and starts with a number. > > The NAME ends with the number. > > If all of your rows can be unambiguously described, either as I have done > or > some other method, then you should be able to parse either with formulas, > or > with a macro. > > Here's a sample macro with a very short list of possibly valid cities that > parses rows that fit the above rules. It should give you some ideas. > > ================================ > Option Explicit > Sub ParseAddr() > Dim myRegExp As Object, myMatches As Object > Dim rg As Range, c As Range > Dim i As Long > > Set rg = Selection > Set myRegExp = CreateObject("vbscript.regexp") > myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" _ > & "\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$" > > For Each c In rg > If myRegExp.test(c.Text) = True Then > Set myMatches = myRegExp.Execute(c.Text) > For i = 0 To 5 > c.Offset(0, i + 1) = myMatches(0).submatches(i) > Next i > End If > Next c > End Sub > ==================================== > > The list of cities needs to be enclosed in parentheses and be pipe > delimited. > They are tested in the order listed, so if you had several cities with > similar > names, the order you list them in can be critical. > --ron
From: Ron Rosenfeld on 25 May 2010 06:11 On Mon, 24 May 2010 22:27:18 -0600, "Jim Berglund" <jazzzbo(a)shaw.ca> wrote: >Ron. Thanks for the effort you put in on this. I would really like to >understand this more. > >Is it checking each row to see if it meets the pattern, and if so, doing the >parsing and then building a new, 5-column list? Yes. And if your row does not meet the format, it will do nothing. >What is the following line for? > Set myRegExp = CreateObject("vbscript.regexp") That's called late binding, and it invokes another program which allows the use of Regular Expressions in VBA. Regular Expressions are a powerful method of manipulating strings. > >Also, could you please help me understand the following? I've never seen any >code like it and I just don't have a clue how to interpret it... > >myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" & >"\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$" That is the pattern derived from the example you gave. Everything was on one line so: ^ Start at the beginning of the line (\D+) Capture everything that is not a digit into submatch 0. (In the definition, the first digit was the beginning of the address). This will be the name. \s+ matches the space(s) between the end of the non-digits and the next group. (.*) capture everything, into submatch 1, up to the next match. This will be the address. \s matches the space between the end of the address and the next group ((CALGARY|MELBOURNE|SYDNEY) Short list of valid city names. This will need to be extended to match your data. Capture into submatch 2. \s+ matches the space(s) between the end of the city name and the province. ([A-Z]{2}) capture the next two capital letter abbreviation as the Province into submatch 3 the rest matches the postcode, and phone number, again, according to the pattern that was in the example you gave. > >I tried it and it ran without any errors. But it didn't do anything. It ran perfectly here using your line of sample data. You have to enter and run it properly. To enter this Macro (Sub), <alt-F11> opens the Visual Basic Editor. Ensure your project is highlighted in the Project Explorer window. Then, from the top menu, select Insert/Module and paste the code into the window that opens. To use this Macro (Sub), FIRST *SELECT* the data you wish to parse. Then <alt-F8> opens the macro dialog box. Select the macro by name, and <RUN>. If you did all that, and the macro did not do anything, then the data you ran it against did not match, in format, what I had laid out. Or, possibly, your newsreader wrapped a line and you did not copy it correctly into the VBEditor. ***Both I and JLGWhiz emphasized to you the importance of properly defining your format.*** --ron
|
Next
|
Last
Pages: 1 2 3 Prev: Using Color as Criteria in Countif Function Next: Create a list with multiple criteria |