From: ekkehard.horner on
Todd Vargo schrieb:
> "Odd" <audrey.nsh(a)gmail.com> wrote in message
> news:8a2b8b75-480a-4315-88d4-849b5ceb3c96(a)r39g2000yqm.googlegroups.com...
>> Hi everyone,
>>
>> I have a .csv file where certain fields have carriage returns/new line
>> feeds. Because of such, I cannot properly import my file. Hence, I
>> wrote the following script and while it removes the carriage returns/
>> new line feeds through the file, it ALSO removes the END OF LINE
>> carriage return/line feed. So now, I have one file with one record.
>>
>> Basically,I need to know how to remove the carriage returns/line feeds
>> in the file EXCEPT for the ones that is END OF LINE. Can somebody help
>> me please?
>>
>> Dim strSearchString, objFSO, objFile
>> Const ForReading = 1
>> Const ForWriting = 2
>>
>>
>> Set objFSO = CreateObject("Scripting.FileSystemObject")
>> Set objFile = objFSO.OpenTextFile("d:\Test\Project.csv", ForReading)
>> strSearchString = objFile.ReadAll
>> objFile.Close
>>
>>
>> objFSO.CreateTextFile "d:\Test\Project_" & DatePart("yyyy",Date) &
>> Right("0" & DatePart("m",Date),2) & Right("0" & DatePart("d", Date),2)
>> & ".csv"
>> Set objFile = objFSO.OpenTextFile("d:\Test\Project_" & DatePart
>> ("yyyy",Date) & Right("0" & DatePart("m",Date),2) & Right("0" &
>> DatePart("d", Date),2) & ".csv", ForWriting)
>> strSearchString = Replace(strSearchString, VbCrLf, "")
>> strSearchString = Replace(strSearchString, Vbtab, "")
>>
>> objFile.Write strSearchString
>> objFile.Close
>
> From your other posts it sounds like the .csv file was created by Excel and
> you see blocks (in Notepad or other editor) for the embedded line feeds.
> Removing the embedded line feeds is simple but then you have the lines
> running into each other. I suggest replacing the the line feeds with spaces
> so you wont have lines crashing together. To do this, 1) Replace all Chr(10)
> with a space. 2) Replace all Chr(13) + Chr(32) with Chr(13) + Chr(10).
>
> Dim strSearchString, objFSO, objFile
> Const ForReading = 1
> Const ForWriting = 2
>
> Set objFSO = CreateObject("Scripting.FileSystemObject")
> Set objFile = objFSO.OpenTextFile("d:\Test\Project.csv", ForReading)
> strSearchString = objFile.ReadAll
> objFile.Close
>
> Set objFile = objFSO.OpenTextFile("d:\Test\Project_" & DatePart
> ("yyyy",Date) & Right("0" & DatePart("m",Date),2) & Right("0" &
> DatePart("d", Date),2) & ".csv", ForWriting)
> strSearchString = Replace(strSearchString, VbLf, " ")
> strSearchString = Replace(strSearchString, VbCr & " ", VbCrLf)
> objFile.Write strSearchString
> objFile.Close
>
> MsgBox "done."
>
This won't work, if Pegasus'

> I created my own Excel file and made one of the cells a two-liner, by
> pressing Alt+Enter after the first line. When I saved the file as a CSV,
> this generated a $0a in this particular field, which is not really
> surprising. Each record was terminated by a $0d$0a, which is also as
> expected.

is correct, because the embedded line endings are *not* vbLf (Chr(10)),
but vbCr (Chr(13))

To prove:

Dim oFS : Set oFS = CreateObject( "Scripting.FileSystemObject" )
Dim sFSpec : sFSpec = ".\adotext\EmbeddedLineEndings.txt"
Dim sFAll : sFAll = oFS.OpenTextFile( sFSpec ).ReadAll()
Dim sT : sT = sFAll
Dim sE : sE = sFAll

WScript.Echo dumpHex( sFAll )
WScript.Echo "------------"

sT = Replace( sT, VbLf, " ")
sT = Replace( sT, VbCr & " ", VbCrLf)
WScript.Echo dumpHex( sT )
WScript.Echo "------------"
WScript.Echo "No change whatsoever:", CStr( sFAll = sT )
WScript.Echo "------------"

sE = Replace( sE, vbCrLf, vbLf )
sE = Replace( sE, vbCr, "<InnerLineBreak>" )
sE = Replace( sE, vbLf, vbCrLf )
WScript.Echo dumpHex( sE )
WScript.Echo "------------"
WScript.Echo sE

Function dumpHex( sT )
Dim nLen : nLen = Len( sT )
ReDim aTmp( nLen - 1 )
Dim nPos
For nPos = 1 To nLen
aTmp( nPos - 1 ) = Right( "0" & Hex( AscB( Mid( sT, nPos, 1 ) ) ), 2 )
Next
dumpHex = Join( aTmp, " " )
End Function

output:

=== replaceLE: replace line endings ===========================================
22 49 64 22 3B 22 4E 61 6D 65 22 3B 22 52 65 6D 61 72 6B 22 3B 22 54 61 69 6C 22 0D 0A 31
3B 22 41 6C 70 68 61 22 3B 22 72 65 6D 31 0D 72 65 6D 32 22
3B 34 37 31 31 0D 0A 32 3B 22 42 65 74 61 22 3B 22 72 65 6D 33 0D 72 65 6D 34 22 3B 34 37
31 32 0D 0A
------------
22 49 64 22 3B 22 4E 61 6D 65 22 3B 22 52 65 6D 61 72 6B 22 3B 22 54 61 69 6C 22 0D 0A 31
3B 22 41 6C 70 68 61 22 3B 22 72 65 6D 31 0D 72 65 6D 32 22
3B 34 37 31 31 0D 0A 32 3B 22 42 65 74 61 22 3B 22 72 65 6D 33 0D 72 65 6D 34 22 3B 34 37
31 32 0D 0A
------------
No change whatsoever: Wahr
------------
22 49 64 22 3B 22 4E 61 6D 65 22 3B 22 52 65 6D 61 72 6B 22 3B 22 54 61 69 6C 22 0D 0A 31
3B 22 41 6C 70 68 61 22 3B 22 72 65 6D 31 3C 49 6E 6E 65 72
4C 69 6E 65 42 72 65 61 6B 3E 72 65 6D 32 22 3B 34 37 31 31 0D 0A 32 3B 22 42 65 74 61 22
3B 22 72 65 6D 33 3C 49 6E 6E 65 72 4C 69 6E 65 42 72 65 61
6B 3E 72 65 6D 34 22 3B 34 37 31 32 0D 0A
------------
"Id";"Name";"Remark";"Tail"
1;"Alpha";"rem1<InnerLineBreak>rem2";4711
2;"Beta";"rem3<InnerLineBreak>rem4";4712

=== replaceLE: 0 done (00:00:00) ==============================================

From: Pegasus [MVP] on

"Todd Vargo" <tlvargo(a)sbcglobal.netz> wrote in message
news:eKfQJ6MMKHA.4028(a)TK2MSFTNGP05.phx.gbl...
> Pegasus [MVP] wrote:
>> Todd Vargo wrote:
> ...
>> > strSearchString = Replace(strSearchString, VbLf, " ")
>> > strSearchString = Replace(strSearchString, VbCr & " ", VbCrLf)
>> > objFile.Write strSearchString
>> > objFile.Close
>> >
>> > MsgBox "done."
>>
>> This is what I thought too but after examining the OP's sample file I
>> realised that it would not work. The multi-line fields contain embedded
>> CRLFs, not just LFs. In my other reply I used the double-quote initiator
> to
>> recognise a quoted field so that I could replace the subsequent CRLFs
>> with
> a
>> space, right up to the closing double quote.
>
> Unfortunately, I did not have a sample file from OP to work with. I
> created
> my own .csv using Excel which embeds the LF without the CR. One of OP's
> posts described the "end of line" character as a block, which is what we
> see
> when LF is missing a preceding CR. I guess OP will just have to try it and
> report back.
>
> --
> Todd Vargo
> (Post questions to group only. Remove "z" to email personal messages)
>

The "blocks" mentioned by the OP and seen in Excel are either $95 or CRLFs.
If you're interested then I'll send you his sample file.


From: Pegasus [MVP] on

"ekkehard.horner" <ekkehard.horner(a)arcor.de> wrote in message
news:4aa6d318$0$32670$9b4e6d93(a)newsspool2.arcor-online.net...
> Todd Vargo schrieb:
>> "Odd" <audrey.nsh(a)gmail.com> wrote in message
>> news:8a2b8b75-480a-4315-88d4-849b5ceb3c96(a)r39g2000yqm.googlegroups.com...
>>> Hi everyone,
>>>
>>> I have a .csv file where certain fields have carriage returns/new line
>>> feeds. Because of such, I cannot properly import my file. Hence, I
>>> wrote the following script and while it removes the carriage returns/
>>> new line feeds through the file, it ALSO removes the END OF LINE
>>> carriage return/line feed. So now, I have one file with one record.
>>>
>>> Basically,I need to know how to remove the carriage returns/line feeds
>>> in the file EXCEPT for the ones that is END OF LINE. Can somebody help
>>> me please?
>>>
>>> Dim strSearchString, objFSO, objFile
>>> Const ForReading = 1
>>> Const ForWriting = 2
>>>
>>>
>>> Set objFSO = CreateObject("Scripting.FileSystemObject")
>>> Set objFile = objFSO.OpenTextFile("d:\Test\Project.csv", ForReading)
>>> strSearchString = objFile.ReadAll
>>> objFile.Close
>>>
>>>
>>> objFSO.CreateTextFile "d:\Test\Project_" & DatePart("yyyy",Date) &
>>> Right("0" & DatePart("m",Date),2) & Right("0" & DatePart("d", Date),2)
>>> & ".csv"
>>> Set objFile = objFSO.OpenTextFile("d:\Test\Project_" & DatePart
>>> ("yyyy",Date) & Right("0" & DatePart("m",Date),2) & Right("0" &
>>> DatePart("d", Date),2) & ".csv", ForWriting)
>>> strSearchString = Replace(strSearchString, VbCrLf, "")
>>> strSearchString = Replace(strSearchString, Vbtab, "")
>>>
>>> objFile.Write strSearchString
>>> objFile.Close
>>
>> From your other posts it sounds like the .csv file was created by Excel
>> and
>> you see blocks (in Notepad or other editor) for the embedded line feeds.
>> Removing the embedded line feeds is simple but then you have the lines
>> running into each other. I suggest replacing the the line feeds with
>> spaces
>> so you wont have lines crashing together. To do this, 1) Replace all
>> Chr(10)
>> with a space. 2) Replace all Chr(13) + Chr(32) with Chr(13) + Chr(10).
>>
>> Dim strSearchString, objFSO, objFile
>> Const ForReading = 1
>> Const ForWriting = 2
>>
>> Set objFSO = CreateObject("Scripting.FileSystemObject")
>> Set objFile = objFSO.OpenTextFile("d:\Test\Project.csv", ForReading)
>> strSearchString = objFile.ReadAll
>> objFile.Close
>>
>> Set objFile = objFSO.OpenTextFile("d:\Test\Project_" & DatePart
>> ("yyyy",Date) & Right("0" & DatePart("m",Date),2) & Right("0" &
>> DatePart("d", Date),2) & ".csv", ForWriting)
>> strSearchString = Replace(strSearchString, VbLf, " ")
>> strSearchString = Replace(strSearchString, VbCr & " ", VbCrLf)
>> objFile.Write strSearchString
>> objFile.Close
>>
>> MsgBox "done."
>>
> This won't work, if Pegasus'
>
> > I created my own Excel file and made one of the cells a two-liner, by
> > pressing Alt+Enter after the first line. When I saved the file as a CSV,
> > this generated a $0a in this particular field, which is not really
> > surprising. Each record was terminated by a $0d$0a, which is also as
> > expected.
>
> is correct, because the embedded line endings are *not* vbLf (Chr(10)),
> but vbCr (Chr(13))

Unfortunately they are CRLFs. If you're interested then I'll send you the
OP's sample file.


From: ekkehard.horner on
Pegasus [MVP] schrieb:
> "ekkehard.horner" <ekkehard.horner(a)arcor.de> wrote in message
> news:4aa6b9de$0$32674$9b4e6d93(a)newsspool2.arcor-online.net...
>> Pegasus [MVP] schrieb:
>> [...]
>>> The dump you posted does not really seem to be a "CSV" - it uses
>>> semicolons as delimiters, not commas. In an attempt to put an end to the
>>> guessing game,
>> That is the German locale form of a .CSV; cross language processing may
>> be cumbersome some times, but as long as you stay with your locale,
>>
>> PROCESSING .CSVs WITH EMBEDDED LINE-ENDINGS CAUSE NO PROBLEMS
>> AT ALL AS LONG AS YOU USE THE PROPER (MICROSOFT) TOOLS
>
> Why all this shouting? Achtung, listen now!
>
>
Because all these "use FSO & string replacements" trials & errors did
cost the OP half a working day
From: ekkehard.horner on
Richard Mueller [MVP] schrieb:
> "ekkehard.horner" <ekkehard.horner(a)arcor.de> wrote in message
[...]
>> [Richard, you use ADO frequently - please help me to convince Odd
>> and Pegasus that giving the text driver a try is NOT stupid]
>
> The one hex dump I've seen in this thread had a normal 0D 0A sequence,
> equivalent to vbCrLf, but I didn't see enough to know what is going on. I
> have used code similar to below to read csv files:
[...]
Thank you - so it seems there are at least two of us.
> It would be intersting to see how code similar to above handles this file.
> Still, if any of the fields has embedded characters that the OP wants to
> remove, you will need to add steps to replace the characters with blanks (or
> something else).
[...]
If Excel wrote the file, then the text driver(s) will parse it. I agree with
you, that extra specs regarding 'unwanted characters' will cause extra work.