From: Keith G Hicks on 5 Nov 2009 19:04 Yeah, I found that out. I'm kind of stabbing in the dark here. I'm asking for help and trying to figure things out while waiting. I'm figuring a few things out but not enough. I have no way of getting these files in a better format than they already are. I'm kind of stuck. I need to know how to take a file and change the encoding to <?xml version="1.0" encoding="ISO-8859-1"?> If I open the file manually in a tool I have called EditPad Pro I can paste the above into the header. Then when I save it EditPad asks if I want to change to the new encoding or not. Works quite well. I also discovered that if I chagne the header in Notepad the characters I'm having toruble with actually come out fine after I save it and reopen in XML editor. So that's why I thought that changing it in vb code would do the same thing. Guess not. Not sure why it works in Notepad. So anyone that can help me write code to encode these files properly would get my sincerest thanks. Thanks, Keith "Scott M." <s-mar(a)nospam.nospam> wrote in message news:%23aMS$InXKHA.4816(a)TK2MSFTNGP06.phx.gbl... > > "Keith G Hicks" <krh(a)comcast.net> wrote in message > news:%23$$5j2mXKHA.4808(a)TK2MSFTNGP06.phx.gbl... >> Never mind. I figured it out: >> >> >> Dim TheFileLines As New List(Of String) >> >> TheFileLines.AddRange(System.IO.File.ReadAllLines(xmlFilesLocation & "\" >> & >> sArticleToPost)) >> >> TheFileLines.RemoveAt(0) >> >> TheFileLines.Insert(0, "<?xml version=""1.0"" encoding=""ISO-8859-1""?>") >> >> System.IO.File.WriteAllLines(xmlFilesLocation & "\" & sArticleToPost, >> TheFileLines.ToArray) >> >> >> >> >> >> "Keith G Hicks" <krh(a)comcast.net> wrote in message >> news:uMeJV7lXKHA.4360(a)TK2MSFTNGP04.phx.gbl... >>> Okay, I need to clean up these files. They are coming out of this goofy >>> system with this header: >>> >>> <?xml version=?1.0? encoding=?UTF-8??> >>> >>> The quotes around things are not coming in as quotes. And it's not the >>> correct encoding anyway. It needs to be this: >>> >>> <?xml version="1.0" encoding="ISO-8859-1"?> >>> >>> >>> So I guess I need to change the encoding of each file before I can open >>> it >>> up as an XML doc and read it there. I have no idea what is the best way >>> to >>> do this programmatically in vb.net. Do I need to open with StreamWriter >>> or >>> is there an easier way? I can't find anything out there that explains >>> this >>> clearly. If I need to do this with streamwriter could someone point me >>> somewhere that shows how to do this? >>> >>> Thanks, >>> >>> Keith > > You realize that just because you've said what you want the encoding to be > doesn't mean that the characters are actually encoded that way, right? >
From: Scott M. on 5 Nov 2009 19:15 "Keith G Hicks" <krh(a)comcast.net> wrote in message news:e2N4BTnXKHA.1236(a)TK2MSFTNGP05.phx.gbl... > Yeah, I found that out. I'm kind of stabbing in the dark here. I'm asking > for help and trying to figure things out while waiting. I'm figuring a few > things out but not enough. > > I have no way of getting these files in a better format than they already > are. I'm kind of stuck. I need to know how to take a file and change the > encoding to <?xml version="1.0" encoding="ISO-8859-1"?> > > If I open the file manually in a tool I have called EditPad Pro I can > paste the above into the header. Then when I save it EditPad asks if I > want to change to the new encoding or not. Works quite well. I also > discovered that if I chagne the header in Notepad the characters I'm > having toruble with actually come out fine after I save it and reopen in > XML editor. So that's why I thought that changing it in vb code would do > the same thing. Guess not. Not sure why it works in Notepad. > > So anyone that can help me write code to encode these files properly would > get my sincerest thanks. > > Thanks, > > Keith Keith, Take a look http://www.15seconds.com/Issue/050616.htm and look at the XmlWriterSettings section. This is what you want. -Scott
From: Keith G Hicks on 5 Nov 2009 23:37 The first line of the file's I'm getting is fouled up and so I cannot open/read it at all using any XML features in VB. The first line is not recognizeable. It's coiming to me saying it's UTF-8 but it's not and the double quotes in the header are not coming to me as double quotes. When I use StreamReader, alter the fist line and then save it as a new file, that almost works but the characters that need to have the correct encoding actually get changed to something else in the save process. I'm guessing the stream reader is interpreting them funny and so it doesn't really matter what I change the header to, the characters themselves change (I checked in a hex editor to be sure). So since it works to manually open these files in notepad and simply change the header to the correct encoding, the characters themselves MUST have the correct binary values. All that needs to be done is to change that header to the right encoding without fouling up the characters in the body. So how can I open the file in the most raw form of text, replace that first line and save it without changing the characters in question in the process? I made some progress with this: Dim sr As New StreamReader(xmlFilesLocation & "\" & sArticleToPost, Encoding.UTF7) Dim text As String = sr.ReadToEnd Dim text2() As String ReDim text2(1) text2(0) = text.Replace("<?xml version=1.0 encoding=UTF-8?>", "<?xml version=""1.0"" encoding=""ISO-8859-1""?>") System.IO.File.WriteAllLines(xmlFilesLocation & "\x" & sArticleToPost, text2) The text2 variable shows the correct characters and when I copy its value into notepad it's fine. But it doesn't save right. I still get weirder characters than I want. It's supposed to have characters like N with a tilde, O with a tilde, O with an accent mark, etc. There are about 6 or 7 I expect to see in this file. But when I open the newly saved files, those characters are converted into very strange characters that I'd have to show you. I have a question regarding all of this. The encoding header merely tells the program that's opening the file how to read the characters that are in it. The characters are of course ultimately stored in binary so the encoding knows how to interpret the binary into readable characters. If I open a file using one encoding and the characters look a certain way and then save it using another, the characters change binary. Is this all true? Am I understandign this or not? I mean the 0's and 1's that are stored on disk don't change just cuz of the way you open it. If you open it using one interpreter (encoding) adn they look this way then open using another encoding you'll see different characters. that makes sense to me. So the only way I could see the binary changing is if the encoding used when saving reinterprets the charcters to different string of 1's and 0's. Yes? Okay, so when I choose the "encoding" parameter of StreamReader, there are only about 5 options (UTF-7, UTF-8, UTF-32, ASCII, Default, ...) How do I tell it I want it to read AND SAVE as ISO-8859-1???? Opening UTF-7 seems to help but OMG when I save using UTF-7 things are a big mess. Thanks, Keith
From: Martin Honnen on 6 Nov 2009 07:08 Keith G Hicks wrote: > Okay, so when I choose the "encoding" parameter of StreamReader, there are > only about 5 options (UTF-7, UTF-8, UTF-32, ASCII, Default, ...) How do I > tell it I want it to read AND SAVE as ISO-8859-1???? Encoding.GetEncoding("ISO-8859-1") should give an Encoding instance allowing you to decode and encode with IS0-8859-1. And both StreamReader and StreamWriter allow you to specify an encoding, for instance StreamWriter has http://msdn.microsoft.com/en-us/library/f5f5x7kt.aspx -- Martin Honnen --- MVP XML http://msmvps.com/blogs/martin_honnen/
From: Keith G Hicks on 6 Nov 2009 10:00 Yep. I found that out late last night. Thanks. It took quite a bit of hunting around to figure this out. It's not intuitive. "GetEncoding" sounds like a read only property. The word "get" is misleading. I finally landed upon something that I read that explained that it was more like "Set" than "Get". Now I'm sure they meant that GetEncoding("ISO-8859-1") means to "get" the encoding of "ISO-8859-1" but that's a bit ambiguous. With Encoding.GetEncoding as the 3rd param of StreamReader, it also could be interrpeted as "get the current encoding of the stream". Thanks for the info. "Martin Honnen" <mahotrash(a)yahoo.de> wrote in message news:Ok74qntXKHA.3600(a)TK2MSFTNGP04.phx.gbl... > Keith G Hicks wrote: > >> Okay, so when I choose the "encoding" parameter of StreamReader, there >> are only about 5 options (UTF-7, UTF-8, UTF-32, ASCII, Default, ...) How >> do I tell it I want it to read AND SAVE as ISO-8859-1???? > > Encoding.GetEncoding("ISO-8859-1") should give an Encoding instance > allowing you to decode and encode with IS0-8859-1. > And both StreamReader and StreamWriter allow you to specify an encoding, > for instance StreamWriter has > http://msdn.microsoft.com/en-us/library/f5f5x7kt.aspx > > > -- > > Martin Honnen --- MVP XML > http://msmvps.com/blogs/martin_honnen/
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: "Form Files" not grouped in my solution explorer? Next: Socket stuck in CLOSE_WAIT |