Prev: Using Trim in a Userform
Next: Try this
From: yakir on 29 May 2010 03:15 Thanks man.. "Ron Rosenfeld" wrote: > On Mon, 24 May 2010 07:21:17 -0500, Chip Pearson <chip(a)cpearson.com> wrote: > > >In VBA insert a new module. Then go to the Tools menu, choose > >References, and scroll down to and check "Microsoft VBScript Regular > >Expressions 5.5". Then, use code like the following > > > >Sub AAA() > >Dim LastRow As Long > >Dim RowNdx As Long > >Dim TopRow As Long > >Dim WS As Worksheet > >Dim Pattern As String > >Dim RegEx As RegExp > >Const ADDR_COL = "A" '<<<< CHANGE > >TopRow = 1 '<<<< CHANGE > >Pattern = "(^[a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}" & _ > > "\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})$" > >Set RegEx = New RegExp > >RegEx.Pattern = Pattern > >Set WS = ActiveSheet > >With WS > > LastRow = .Cells(.Rows.Count, ADDR_COL).End(xlUp).Row > > For RowNdx = LastRow To TopRow Step -1 > > If RegEx.Test(.Cells(RowNdx, ADDR_COL).Text) = False Then > > .Rows(RowNdx).Delete > > End If > > Next RowNdx > >End With > >End Sub > > > > > >Change the lines marked with <<<< to the correct values. ADDR_COL > >should be the column letter of the column containing the email > >addresses to test. TopRow should be the row number of the top of the > >range of values to test. The code begins testing the addresses on the > >row calculated as LastRow and moves upwards until it reaches TopRow, > >at which point it quits. For each row between LastRow and TopRow > >(inclusive), if an invalid email address is found in ADDR_COL, that > >row is deleted. > > > >Cordially, > >Chip Pearson > >Microsoft Most Valuable Professional, > > Excel, 1998 - 2010 > >Pearson Software Consulting, LLC > >www.cpearson.com > > > > > > > > > > It's tough to devise a regex that can comply with the relevant internet > standards. I think the current one is RFC5322 > > Your regex excludes some email addresses which I believe are valid, and > includes some which are invalid. > > For example, your regex excludes any TLD's that have more than four characters. > But the current generic domain list includes .museum & .travel, and there will > likely be more, longer lettered TLD's in the future. > > It also excludes some characters that are valid in email addresses, at least in > the local part: > > ! # $ % & ' * + - / = ? ^ _ ` { | } ~ > > It allows consecutive dots in the local part of the address, as well as > allowing them at the start or end of the local part of the address -- and that > is not valid. > > Since the OP is looking for malformed email addresses, I'd be concerned that > your regex would eliminate some valid one's, and include some invalid one's. > > It can be pretty tough to devise something that does not have false positives > or negatives. If IP addresses don't need to be included, I'd suggest something > (admittedly incomplete) like (with the case insensitive option (ignorecase) > equal to true): > > > "^[-\w.%+']+@[A-Z0-9.-]+\.(?:aero|asia|biz|cat|com|coop|edu|gov|info|int|" _ > & "jobs|mil|mobi|museum|name|net|org|pro|tel|travel|[A-Z]{2})$" > > which will allow any two character domain name (to allow for the country codes) > as well as the currently used specific list of generic top-level domains. > > > > --ron > . >
From: Ron Rosenfeld on 29 May 2010 05:25
On Sat, 29 May 2010 00:15:01 -0700, yakir <yakir(a)discussions.microsoft.com> wrote: >Thanks man.. You're welcome. Glad to help. If you are going to be using my modification, be sure to add a line after these two in Chip's routine: Set RegEx = New RegExp RegEx.Pattern = Pattern 'Add this next line: RegEx.IgnoreCase = True --ron |