From: Highlander on 21 Sep 2009 17:04 Hello all. I've used this script before to eliminate duplicate lines in a text file, or in an array: http://blogs.technet.com/heyscriptingguy/archive/2005/04/13/how-can-i-eliminate-duplicate-names-in-a-text-file.aspx Now I need to eliminate duplicate lines in a comma separated file. The contents of the CSV file are the following: Software Program One,3.0,9/21/2009 Software Program Two,1.0,9/21/2009 Software Program Two, Software Program Three,2.0,9/21/2009 Software Program Four,1.0,9/21/2009 Software Program Five,1.0,9/21/2009 Software Program Five,,9/21/2009 Software Program Six,1.0,9/21/2009 I'm not comparing the entire line. Of the 3 fields in the CSV, I want to find lines where the FIRST FIELD is a duplicate. So these lines would qualify as duplicates: Software Program Two,1.0,9/21/2009 Software Program Two, Software Program Five,1.0,9/21/2009 Software Program Five,,9/21/2009 And then comparing the duplicates, I want to eliminate the line that has the 2nd and/or the 3rd field empty; and retain the line which has all 3 fields populated. So my end result would be the CSV looking like the following: Software Program One,3.0,9/21/2009 Software Program Two,1.0,9/21/2009 Software Program Three,2.0,9/21/2009 Software Program Four,1.0,9/21/2009 Software Program Five,1.0,9/21/2009 Software Program Six,1.0,9/21/2009 I've come up with a script that will parse a CSV and display the values in the 3 fields, but I can't figure out the logic to add that will compare and eliminate the duplicates in the manner I need: Const ForReading = 1, ForWriting = 2, ForAppending = 8 Set objFSO = CreateObject("Scripting.FileSystemObject") Set objDictionary = CreateObject("Scripting.Dictionary") Dim sFile, sQueryResults sFile = "File.txt" Set objFile = objFSO.OpenTextFile (sFile,1) sQueryResults = objFile.ReadAll objFile.Close GetArrayValues(sQueryResults) Function GetArrayValues(sList) Arr = Split(sList, vbCrlf) For i = 0 To UBound(Arr) sLine = Arr(i) ArrFields = Split(sLine, ",") count = -1 sFieldInfo = "" For j = 0 To UBound(ArrFields) count = count + 1 sField = ArrFields(j) IF sField ="" Then sField = "EMPTY FIELD" sFieldNum = "Array Field " & count sFieldInfo = sFieldInfo & sFieldNum & ":" & _ vbTab & sField & vbCrlf Next Wscript.Echo vbCrlf & "Array:" & vbTab & vbTab & _ sLine & vbCrlf & "Array UBound:" & vbTab & _ UBound(ArrFields) & vbCrlf & sFieldInfo count = -1 Next End Function ' GetArrayValues Set objFSO = nothing Set objDictionary = nothing How can accomplish what I need to? Any help would be greatly appreciated. Thanks! - Dave
From: Richard Mueller [MVP] on 21 Sep 2009 18:48 "Highlander" <tron9901(a)msn.com> wrote in message news:09c67f4d-efdd-482c-88fc-9b26c49df255(a)g6g2000vbr.googlegroups.com... > Hello all. > > I've used this script before to eliminate duplicate lines in a text > file, or in an array: > > http://blogs.technet.com/heyscriptingguy/archive/2005/04/13/how-can-i-eliminate-duplicate-names-in-a-text-file.aspx > > Now I need to eliminate duplicate lines in a comma separated file. The > contents of the CSV file are the following: > > Software Program One,3.0,9/21/2009 > Software Program Two,1.0,9/21/2009 > Software Program Two, > Software Program Three,2.0,9/21/2009 > Software Program Four,1.0,9/21/2009 > Software Program Five,1.0,9/21/2009 > Software Program Five,,9/21/2009 > Software Program Six,1.0,9/21/2009 > > I'm not comparing the entire line. Of the 3 fields in the CSV, I want > to find lines where the FIRST FIELD is a duplicate. So these lines > would qualify as duplicates: > > Software Program Two,1.0,9/21/2009 > Software Program Two, > > Software Program Five,1.0,9/21/2009 > Software Program Five,,9/21/2009 > > And then comparing the duplicates, I want to eliminate the line that > has the 2nd and/or the 3rd field empty; and retain the line which has > all 3 fields populated. So my end result would be the CSV looking like > the following: > > Software Program One,3.0,9/21/2009 > Software Program Two,1.0,9/21/2009 > Software Program Three,2.0,9/21/2009 > Software Program Four,1.0,9/21/2009 > Software Program Five,1.0,9/21/2009 > Software Program Six,1.0,9/21/2009 > > I've come up with a script that will parse a CSV and display the > values in the 3 fields, but I can't figure out the logic to add that > will compare and eliminate the duplicates in the manner I need: > > Const ForReading = 1, ForWriting = 2, ForAppending = 8 > Set objFSO = CreateObject("Scripting.FileSystemObject") > Set objDictionary = CreateObject("Scripting.Dictionary") > Dim sFile, sQueryResults > sFile = "File.txt" > Set objFile = objFSO.OpenTextFile (sFile,1) > sQueryResults = objFile.ReadAll > objFile.Close > GetArrayValues(sQueryResults) > Function GetArrayValues(sList) > Arr = Split(sList, vbCrlf) > For i = 0 To UBound(Arr) > sLine = Arr(i) > ArrFields = Split(sLine, ",") > count = -1 > sFieldInfo = "" > For j = 0 To UBound(ArrFields) > count = count + 1 > sField = ArrFields(j) > IF sField ="" Then sField = "EMPTY FIELD" > sFieldNum = "Array Field " & count > sFieldInfo = sFieldInfo & sFieldNum & ":" & _ > vbTab & sField & vbCrlf > Next > Wscript.Echo vbCrlf & "Array:" & vbTab & vbTab & _ > sLine & vbCrlf & "Array UBound:" & vbTab & _ > UBound(ArrFields) & vbCrlf & sFieldInfo > count = -1 > Next > End Function ' GetArrayValues > Set objFSO = nothing > Set objDictionary = nothing > > How can accomplish what I need to? > > Any help would be greatly appreciated. Thanks! > > - Dave It might help to use the Jet OLEDB driver to read the csv file. I would add a header line so you can sort the fields by name. This will arrange all lines with the same value for the first field together. Because blanks are sorted before other values, lines with blanks are always first. After the lines are sorted, you want to output the last line for each unique value in the first field. There may be a better way to do this using SQL, but I could only think of the way below. This works for your example lines. I added the following header line to the start of the csv file: Field1,Field2,Field3 Then I used the code below: ========== Option Explicit Dim adoCSVConnection, adoCSVRecordSet, strPathToTextfile Dim strCSVFile, k, strLine, strPrevious, strFlag ' Specify path to CSV file. strPathToTextFile = "c:\Scripts\" ' Specify CSV file name. strCSVFile = "Example.csv" ' Open connection to the CSV file. Set adoCSVConnection = CreateObject("ADODB.Connection") Set adoCSVRecordSet = CreateObject("ADODB.Recordset") ' Open CSV file with header line. adoCSVConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _ "Data Source=" & strPathtoTextFile & ";" & _ "Extended Properties=""text;HDR=YES;FMT=Delimited""" ' Sort by all fields, so lines with blanks are first. ' If there is any line with no blanks, it will be the last ' for each unique value of the first field. adoCSVRecordset.Open "SELECT * FROM " & strCSVFile _ & " ORDER BY Field1, Field2, Field3", adoCSVConnection ' Read the CSV file. strPrevious = "" Do Until adoCSVRecordset.EOF ' Display the last line for each unique value of the ' first field. strFlag = adoCSVRecordset.Fields("Field1").Value If (strPrevious <> "") And (strFlag <> strPrevious) Then Wscript.Echo strLine End If strLine = adoCSVRecordset.Fields("Field1").Value _ & "," & adoCSVRecordset.Fields("Field2").Value _ & "," & adoCSVRecordset.Fields("Field3").Value strPrevious = adoCSVRecordset.Fields("Field1").Value adoCSVRecordset.MoveNext Loop Wscript.Echo strLine ' Clean up. adoCSVRecordset.Close adoCSVConnection.Close -- Richard Mueller MVP Directory Services Hilltop Lab - http://www.rlmueller.net --
From: HL0105 on 22 Sep 2009 09:54 On Sep 21, 5:48 pm, "Richard Mueller [MVP]" <rlmueller- nos...(a)ameritech.nospam.net> wrote: > "Highlander" <tron9...(a)msn.com> wrote in message > > news:09c67f4d-efdd-482c-88fc-9b26c49df255(a)g6g2000vbr.googlegroups.com... > > > > > > > Hello all. > > > I've used this script before to eliminate duplicate lines in a text > > file, or in an array: > > >http://blogs.technet.com/heyscriptingguy/archive/2005/04/13/how-can-i... > > > Now I need to eliminate duplicate lines in a comma separated file. The > > contents of the CSV file are the following: > > > Software Program One,3.0,9/21/2009 > > Software Program Two,1.0,9/21/2009 > > Software Program Two, > > Software Program Three,2.0,9/21/2009 > > Software Program Four,1.0,9/21/2009 > > Software Program Five,1.0,9/21/2009 > > Software Program Five,,9/21/2009 > > Software Program Six,1.0,9/21/2009 > > > I'm not comparing the entire line. Of the 3 fields in the CSV, I want > > to find lines where the FIRST FIELD is a duplicate. So these lines > > would qualify as duplicates: > > > Software Program Two,1.0,9/21/2009 > > Software Program Two, > > > Software Program Five,1.0,9/21/2009 > > Software Program Five,,9/21/2009 > > > And then comparing the duplicates, I want to eliminate the line that > > has the 2nd and/or the 3rd field empty; and retain the line which has > > all 3 fields populated. So my end result would be the CSV looking like > > the following: > > > Software Program One,3.0,9/21/2009 > > Software Program Two,1.0,9/21/2009 > > Software Program Three,2.0,9/21/2009 > > Software Program Four,1.0,9/21/2009 > > Software Program Five,1.0,9/21/2009 > > Software Program Six,1.0,9/21/2009 > > > I've come up with a script that will parse a CSV and display the > > values in the 3 fields, but I can't figure out the logic to add that > > will compare and eliminate the duplicates in the manner I need: > > > Const ForReading = 1, ForWriting = 2, ForAppending = 8 > > Set objFSO = CreateObject("Scripting.FileSystemObject") > > Set objDictionary = CreateObject("Scripting.Dictionary") > > Dim sFile, sQueryResults > > sFile = "File.txt" > > Set objFile = objFSO.OpenTextFile (sFile,1) > > sQueryResults = objFile.ReadAll > > objFile.Close > > GetArrayValues(sQueryResults) > > Function GetArrayValues(sList) > > Arr = Split(sList, vbCrlf) > > For i = 0 To UBound(Arr) > > sLine = Arr(i) > > ArrFields = Split(sLine, ",") > > count = -1 > > sFieldInfo = "" > > For j = 0 To UBound(ArrFields) > > count = count + 1 > > sField = ArrFields(j) > > IF sField ="" Then sField = "EMPTY FIELD" > > sFieldNum = "Array Field " & count > > sFieldInfo = sFieldInfo & sFieldNum & ":" & _ > > vbTab & sField & vbCrlf > > Next > > Wscript.Echo vbCrlf & "Array:" & vbTab & vbTab & _ > > sLine & vbCrlf & "Array UBound:" & vbTab & _ > > UBound(ArrFields) & vbCrlf & sFieldInfo > > count = -1 > > Next > > End Function ' GetArrayValues > > Set objFSO = nothing > > Set objDictionary = nothing > > > How can accomplish what I need to? > > > Any help would be greatly appreciated. Thanks! > > > - Dave > > It might help to use the Jet OLEDB driver to read the csv file. I would add > a header line so you can sort the fields by name. This will arrange all > lines with the same value for the first field together. Because blanks are > sorted before other values, lines with blanks are always first. After the > lines are sorted, you want to output the last line for each unique value in > the first field. There may be a better way to do this using SQL, but I could > only think of the way below. This works for your example lines. I added the > following header line to the start of the csv file: > > Field1,Field2,Field3 > > Then I used the code below: > ========== > Option Explicit > > Dim adoCSVConnection, adoCSVRecordSet, strPathToTextfile > Dim strCSVFile, k, strLine, strPrevious, strFlag > > ' Specify path to CSV file. > strPathToTextFile = "c:\Scripts\" > > ' Specify CSV file name. > strCSVFile = "Example.csv" > > ' Open connection to the CSV file. > Set adoCSVConnection = CreateObject("ADODB.Connection") > Set adoCSVRecordSet = CreateObject("ADODB.Recordset") > > ' Open CSV file with header line. > adoCSVConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _ > "Data Source=" & strPathtoTextFile & ";" & _ > "Extended Properties=""text;HDR=YES;FMT=Delimited""" > > ' Sort by all fields, so lines with blanks are first. > ' If there is any line with no blanks, it will be the last > ' for each unique value of the first field. > adoCSVRecordset.Open "SELECT * FROM " & strCSVFile _ > & " ORDER BY Field1, Field2, Field3", adoCSVConnection > > ' Read the CSV file. > strPrevious = "" > Do Until adoCSVRecordset.EOF > ' Display the last line for each unique value of the > ' first field. > strFlag = adoCSVRecordset.Fields("Field1").Value > If (strPrevious <> "") And (strFlag <> strPrevious) Then > Wscript.Echo strLine > End If > strLine = adoCSVRecordset.Fields("Field1").Value _ > & "," & adoCSVRecordset.Fields("Field2").Value _ > & "," & adoCSVRecordset.Fields("Field3").Value > strPrevious = adoCSVRecordset.Fields("Field1").Value > adoCSVRecordset.MoveNext > Loop > Wscript.Echo strLine > > ' Clean up. > adoCSVRecordset.Close > adoCSVConnection.Close > > -- > Richard Mueller > MVP Directory Services > Hilltop Lab -http://www.rlmueller.net > --- Hide quoted text - > > - Show quoted text - Thanks Richard your script does the job! Except for one thing - it's changing the value of the second field. Looks to be chopping it off after a certain number of decimal points. This is more clearly illustrated if you change the contents of the CSV file to the following, and then run your script: Field1,Field2,Field3 Software Program One,1.2.3.456789,12/31/2009 Software Program Two,2.1.234.56789,12/31/2009 Software Program Two, Software Program Three,3.123.456.789,12/31/2009 Software Program Four,4.12.34.56.789,12/31/2009 Software Program Five,5.1234567.89,9,12/31/2009 Software Program Five,,12/31/2009 Software Program Six,6.123456.78.9,12/31/2009 - Dave
From: Richard Mueller [MVP] on 22 Sep 2009 10:47 "HL0105" <tron9901(a)msn.com> wrote in message news:97f6384c-e7ac-44c2-9b43-64fd17f4aff4(a)m11g2000vbl.googlegroups.com... On Sep 21, 5:48 pm, "Richard Mueller [MVP]" <rlmueller- nos...(a)ameritech.nospam.net> wrote: > "Highlander" <tron9...(a)msn.com> wrote in message >snip... Thanks Richard your script does the job! Except for one thing - it's changing the value of the second field. Looks to be chopping it off after a certain number of decimal points. This is more clearly illustrated if you change the contents of the CSV file to the following, and then run your script: Field1,Field2,Field3 Software Program One,1.2.3.456789,12/31/2009 Software Program Two,2.1.234.56789,12/31/2009 Software Program Two, Software Program Three,3.123.456.789,12/31/2009 Software Program Four,4.12.34.56.789,12/31/2009 Software Program Five,5.1234567.89,9,12/31/2009 Software Program Five,,12/31/2009 Software Program Six,6.123456.78.9,12/31/2009 - Dave ----------- The driver assumes the value is numeric. The solution is to create a schema.ini file in the folder with the csv file. I used the following schema.ini file: =========== [Example.csv] Format=CSVDelimited Col1="Field1" Text Col2="Field2" Text Col3="Field3" Text ======== The first line is the name of the file in square brackets. This file told the driver that all fields are text. There are many other things you can do in such a file. I found this link: http://msdn.microsoft.com/en-us/library/ms709353(VS.85).aspx The driver knows to magically look for the schema.ini file and use it if the specified file name matches. -- Richard Mueller MVP Directory Services Hilltop Lab - http://www.rlmueller.net --
From: HL0105 on 22 Sep 2009 11:55 On Sep 22, 9:47 am, "Richard Mueller [MVP]" <rlmueller- nos...(a)ameritech.nospam.net> wrote: > "HL0105" <tron9...(a)msn.com> wrote in message > > news:97f6384c-e7ac-44c2-9b43-64fd17f4aff4(a)m11g2000vbl.googlegroups.com... > On Sep 21, 5:48 pm, "Richard Mueller [MVP]" <rlmueller- > > nos...(a)ameritech.nospam.net> wrote: > > "Highlander" <tron9...(a)msn.com> wrote in message > >snip... > > Thanks Richard your script does the job! Except for one thing - it's > changing the value of the second field. Looks to be chopping it off > after a certain number of decimal points. > > This is more clearly illustrated if you change the contents of the CSV > file to the following, and then run your script: > > Field1,Field2,Field3 > Software Program One,1.2.3.456789,12/31/2009 > Software Program Two,2.1.234.56789,12/31/2009 > Software Program Two, > Software Program Three,3.123.456.789,12/31/2009 > Software Program Four,4.12.34.56.789,12/31/2009 > Software Program Five,5.1234567.89,9,12/31/2009 > Software Program Five,,12/31/2009 > Software Program Six,6.123456.78.9,12/31/2009 > > - Dave > ----------- > > The driver assumes the value is numeric. The solution is to create a > schema.ini file in the folder with the csv file. I used the following > schema.ini file: > =========== > [Example.csv] > Format=CSVDelimited > > Col1="Field1" Text > Col2="Field2" Text > Col3="Field3" Text > ======== > The first line is the name of the file in square brackets. This file told > the driver that all fields are text. There are many other things you can do > in such a file. I found this link: > > http://msdn.microsoft.com/en-us/library/ms709353(VS.85).aspx > > The driver knows to magically look for the schema.ini file and use it if the > specified file name matches. > > -- > Richard Mueller > MVP Directory Services > Hilltop Lab -http://www.rlmueller.net > -- The schema.ini file solved the problem. Thanks Richard! Two more questions. 1. I'm using this code within an HTA, running it from a mapped network drive. So I keep getting the message "This website uses a data provider that may be unsafe. If you trust the website, click OK, otherwise click Cancel." Is there a way to prevent this annoying message? 2. I actually have the data in an array; I'm writing the array to a CSV file in order to use your code. Is it possible to use this ADO code to parse the array directly, instead of parsing the CSV file? Thanks again! - Dave
|
Next
|
Last
Pages: 1 2 Prev: ADSystemInfo and ComputerName Next: shell.application throw error on asp file |