From: Pegasus [MVP] on 8 Sep 2009 17:25 "Odd" <audrey.nsh(a)gmail.com> wrote in message news:385cd965-be36-415e-9d0e-124d1b260c92(a)r9g2000yqa.googlegroups.com... Ok, Ijust sent you to your e-mail address......let me know..... =========== Your file is a comma-delimited text file. Some of your fields contain an excessive number of trailing spaces that you should get rid of. Some contain tabs ($09) and extended characters $95 that you may want to suppress. Some contain quoted strings with embedded CRLFs. Here is a way to replace them with spaces. It's not a particularly elegant way - perhaps someone can think of a better method. Const ForReading = 1 Const ForWriting = 2 Const sQuote = """" Set objFSO = CreateObject("Scripting.FileSystemObject") Set objInput = objFSO.OpenTextFile("d:\csv.csv", ForReading) objFSO.CreateTextFile "d:\Project_" & DatePart("yyyy", Date) _ & Right("0" & DatePart("m",Date),2) & Right("0" & DatePart("d", Date),2) & ".csv" Set objOutput = objFSO.OpenTextFile("d:\Project_" _ & DatePart("yyyy", Date) & Right("0" & DatePart("m", Date), 2) _ & Right("0" & DatePart("d", Date),2) & ".csv", ForWriting) sData = Replace(objInput.ReadAll, VbCrLf, Chr(0)) sOut = "" bQuoted = False For i = 1 To Len(sData) If Mid(sData, i, 1) = sQuote Then bQuoted = Not bQuoted If Mid(sData, i, 1) = Chr(0) And bQuoted Then sOut = sOut & " " Else sOut = sOut & Mid(sData, i, 1) End If Next objOutput.Write Replace(sOut, Chr(0), VbCrLf) objOutput.Close
From: Pegasus [MVP] on 8 Sep 2009 17:27 "ekkehard.horner" <ekkehard.horner(a)arcor.de> wrote in message news:4aa6b9de$0$32674$9b4e6d93(a)newsspool2.arcor-online.net... > Pegasus [MVP] schrieb: > [...] >> The dump you posted does not really seem to be a "CSV" - it uses >> semicolons as delimiters, not commas. In an attempt to put an end to the >> guessing game, > > That is the German locale form of a .CSV; cross language processing may > be cumbersome some times, but as long as you stay with your locale, > > PROCESSING .CSVs WITH EMBEDDED LINE-ENDINGS CAUSE NO PROBLEMS > AT ALL AS LONG AS YOU USE THE PROPER (MICROSOFT) TOOLS Why all this shouting? Achtung, listen now!
From: Pegasus [MVP] on 8 Sep 2009 17:33 "Todd Vargo" <tlvargo(a)sbcglobal.netz> wrote in message news:%232o6zoMMKHA.4064(a)TK2MSFTNGP06.phx.gbl... > > "Odd" <audrey.nsh(a)gmail.com> wrote in message > news:8a2b8b75-480a-4315-88d4-849b5ceb3c96(a)r39g2000yqm.googlegroups.com... >> Hi everyone, >> >> I have a .csv file where certain fields have carriage returns/new line >> feeds. Because of such, I cannot properly import my file. Hence, I >> wrote the following script and while it removes the carriage returns/ >> new line feeds through the file, it ALSO removes the END OF LINE >> carriage return/line feed. So now, I have one file with one record. >> >> Basically,I need to know how to remove the carriage returns/line feeds >> in the file EXCEPT for the ones that is END OF LINE. Can somebody help >> me please? >> >> Dim strSearchString, objFSO, objFile >> Const ForReading = 1 >> Const ForWriting = 2 >> >> >> Set objFSO = CreateObject("Scripting.FileSystemObject") >> Set objFile = objFSO.OpenTextFile("d:\Test\Project.csv", ForReading) >> strSearchString = objFile.ReadAll >> objFile.Close >> >> >> objFSO.CreateTextFile "d:\Test\Project_" & DatePart("yyyy",Date) & >> Right("0" & DatePart("m",Date),2) & Right("0" & DatePart("d", Date),2) >> & ".csv" >> Set objFile = objFSO.OpenTextFile("d:\Test\Project_" & DatePart >> ("yyyy",Date) & Right("0" & DatePart("m",Date),2) & Right("0" & >> DatePart("d", Date),2) & ".csv", ForWriting) >> strSearchString = Replace(strSearchString, VbCrLf, "") >> strSearchString = Replace(strSearchString, Vbtab, "") >> >> objFile.Write strSearchString >> objFile.Close > > From your other posts it sounds like the .csv file was created by Excel > and > you see blocks (in Notepad or other editor) for the embedded line feeds. > Removing the embedded line feeds is simple but then you have the lines > running into each other. I suggest replacing the the line feeds with > spaces > so you wont have lines crashing together. To do this, 1) Replace all > Chr(10) > with a space. 2) Replace all Chr(13) + Chr(32) with Chr(13) + Chr(10). > > Dim strSearchString, objFSO, objFile > Const ForReading = 1 > Const ForWriting = 2 > > Set objFSO = CreateObject("Scripting.FileSystemObject") > Set objFile = objFSO.OpenTextFile("d:\Test\Project.csv", ForReading) > strSearchString = objFile.ReadAll > objFile.Close > > Set objFile = objFSO.OpenTextFile("d:\Test\Project_" & DatePart > ("yyyy",Date) & Right("0" & DatePart("m",Date),2) & Right("0" & > DatePart("d", Date),2) & ".csv", ForWriting) > strSearchString = Replace(strSearchString, VbLf, " ") > strSearchString = Replace(strSearchString, VbCr & " ", VbCrLf) > objFile.Write strSearchString > objFile.Close > > MsgBox "done." > > -- > Todd Vargo > (Post questions to group only. Remove "z" to email personal messages) This is what I thought too but after examining the OP's sample file I realised that it would not work. The multi-line fields contain embedded CRLFs, not just LFs. In my other reply I used the double-quote initiator to recognise a quoted field so that I could replace the subsequent CRLFs with a space, right up to the closing double quote.
From: Richard Mueller [MVP] on 8 Sep 2009 17:41 "ekkehard.horner" <ekkehard.horner(a)arcor.de> wrote in message news:4aa6bdaf$0$32670$9b4e6d93(a)newsspool2.arcor-online.net... > Odd schrieb: > [...] >> On Sep 8, 12:38 pm, "Richard Mueller [MVP]" <rlmueller- >> nos...(a)ameritech.nospam.net> wrote: > [...] >>> Do Until objInput.AtEndofStream >>> strLine = Trim(objInput.ReadLine) >>> If (strLine <> "") Then >>> strLine = Replace(strLine, vbCrLf, "") >>> strLine = Replaced(strLine, vbCr, "") >>> strLine = Replaced(strLine, vbLf, "") >>> strLine = Replaced(strLine, vbFormFeed, "") >>> strLine = Replace(strLine, vbtab,"") >>> objOutput.WriteLine strLine >>> End If >>> Loop > [...] >> >> What Richard is saying is correct.....the carriage returns and line >> feeds is embedded in the field values. Richard: I tried your code..it >> removed the tab,but did not remove the carriage return nor the line >> feed. > > That is to be expected, because ReadLine will read data from the file > upto the bytes it considers line endings (vbLf, vbCrLf; I'm fairly > sure that vbCr isn't a line ending, but don't believe, do your own tests), > discard the line ending, and return the 'clean' string/line. > > [Richard, you use ADO frequently - please help me to convince Odd > and Pegasus that giving the text driver a try is NOT stupid] The one hex dump I've seen in this thread had a normal 0D 0A sequence, equivalent to vbCrLf, but I didn't see enough to know what is going on. I have used code similar to below to read csv files: ============= Option Explicit Dim adoCSVConnection, adoCSVRecordSet, strPathToTextfile Dim strCSVFile, k ' Specify path to CSV file. strPathToTextFile = "c:\Scripts\" ' Specify CSV file name. strCSVFile = "Example.csv" ' Open connection to the CSV file. Set adoCSVConnection = CreateObject("ADODB.Connection") Set adoCSVRecordSet = CreateObject("ADODB.Recordset") ' Open CSV file with no header line. adoCSVConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _ "Data Source=" & strPathtoTextFile & ";" & _ "Extended Properties=""text;HDR=NO;FMT=Delimited""" adoCSVRecordset.Open "SELECT * FROM " & strCSVFile, adoCSVConnection ' Read the CSV file. Do Until adoCSVRecordset.EOF ' Display Values in all fields. For k = 0 To adoCSVRecordset.Fields.Count -1 Wscript.Echo adoCSVRecordset.Fields.Item(k).Value Next adoCSVRecordset.MoveNext Loop ' Clean up. adoCSVRecordset.Close adoCSVConnection.Close ========= Of course, if the file has a header line, you can reference the fields by name. For example (in part): =========== ' Open CSV file with header line. adoCSVConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _ "Data Source=" & strPathtoTextFile & ";" & _ "Extended Properties=""text;HDR=YES;FMT=Delimited""" adoCSVRecordset.Open "SELECT * FROM " & strCSVFile, adoCSVConnection ' Read the CSV file. Do Until adoCSVRecordset.EOF ' Display Values in fields by field name (from header line). Wscript.Echo "First Name: " & adoCSVRecordset.Fields("First").Value Wscript.Echo "Middle Name: " & adoCSVRecordset.Fields("Middle").Value Wscript.Echo "Last Name: " & adoCSVRecordset.Fields("Last").Value adoCSVRecordset.MoveNext Loop ======== It would be intersting to see how code similar to above handles this file. Still, if any of the fields has embedded characters that the OP wants to remove, you will need to add steps to replace the characters with blanks (or something else). -- Richard Mueller MVP Directory Services Hilltop Lab - http://www.rlmueller.net --
From: Todd Vargo on 8 Sep 2009 17:51
Pegasus [MVP] wrote: > Todd Vargo wrote: .... > > strSearchString = Replace(strSearchString, VbLf, " ") > > strSearchString = Replace(strSearchString, VbCr & " ", VbCrLf) > > objFile.Write strSearchString > > objFile.Close > > > > MsgBox "done." > > This is what I thought too but after examining the OP's sample file I > realised that it would not work. The multi-line fields contain embedded > CRLFs, not just LFs. In my other reply I used the double-quote initiator to > recognise a quoted field so that I could replace the subsequent CRLFs with a > space, right up to the closing double quote. Unfortunately, I did not have a sample file from OP to work with. I created my own .csv using Excel which embeds the LF without the CR. One of OP's posts described the "end of line" character as a block, which is what we see when LF is missing a preceding CR. I guess OP will just have to try it and report back. -- Todd Vargo (Post questions to group only. Remove "z" to email personal messages) |