From: Jean-Michel Pichavant on 13 Aug 2010 05:45 Hello python world, I'm trying to update the content of a $Microsoft$ VC2005 project files using a python application. Since those files are XML data, I assumed I could easily do that. My problem is that VC somehow thinks that the file is corrupted and update the file like the following: -<?xml version='1.0' encoding='UTF-8'?> +?<feff><?xml version="1.0" encoding="UTF-8"?> Actually, <feff> is displayed in a different color by vim, telling me that this is some kind of special caracter code (I'm no familiar with such thing). After googling that, I have a clue : could be some unicode caracter use to indicate something ... well I don't know in fact ("UTF-8 files sometimes start with a byte-order marker (BOM) to indicate that they are encoded in UTF-8."). My problem is however simplier : how do I add such character at the begining of the file ? I tried f = open('paf', w) f.write(u'\ufeff') UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128) The error may be explicit but I have no idea how to proceed further. Any clue ? JM
From: Tim Golden on 13 Aug 2010 05:58 On 13/08/2010 10:45, Jean-Michel Pichavant wrote: > My problem is however simplier : how do I add such character at the > begining of the file ? > I tried > > f = open('paf', w) f = open ("pag", "wb") f.write ("\xfe\xff") TJG
From: Ulrich Eckhardt on 13 Aug 2010 06:43 Jean-Michel Pichavant wrote: > My problem is however simplier : how do I add such character [a BOM] > at the begining of the file ? > I tried > > f = open('paf', w) > f.write(u'\ufeff') > > UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in > position 0: ordinal not in range(128) Try the codecs module to open the file, which will then do all the transcoding between internal texts and external UTF-8 for you. Uli -- Sator Laser GmbH Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
From: MRAB on 13 Aug 2010 13:45 Jean-Michel Pichavant wrote: > Hello python world, > > I'm trying to update the content of a $Microsoft$ VC2005 project files > using a python application. > Since those files are XML data, I assumed I could easily do that. > > My problem is that VC somehow thinks that the file is corrupted and > update the file like the following: > > -<?xml version='1.0' encoding='UTF-8'?> > +?<feff><?xml version="1.0" encoding="UTF-8"?> > > > Actually, <feff> is displayed in a different color by vim, telling me > that this is some kind of special caracter code (I'm no familiar with > such thing). > After googling that, I have a clue : could be some unicode caracter use > to indicate something ... well I don't know in fact ("UTF-8 files > sometimes start with a byte-order marker (BOM) to indicate that they are > encoded in UTF-8."). > > My problem is however simplier : how do I add such character at the > begining of the file ? > I tried > > f = open('paf', w) > f.write(u'\ufeff') > > UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in > position 0: ordinal not in range(128) > > The error may be explicit but I have no idea how to proceed further. Any > clue ? > In Python 2 the default encoding is 'ascii'. What you want is 'utf-8'. Use codecs.open() instead, with the 'utf-8-sig' encoding, which will include the BOM.
From: Nobody on 13 Aug 2010 14:04 On Fri, 13 Aug 2010 11:45:28 +0200, Jean-Michel Pichavant wrote: > I'm trying to update the content of a $Microsoft$ VC2005 project files > using a python application. > Since those files are XML data, I assumed I could easily do that. > > My problem is that VC somehow thinks that the file is corrupted and > update the file like the following: > > -<?xml version='1.0' encoding='UTF-8'?> > +?<feff><?xml version="1.0" encoding="UTF-8"?> > > > Actually, <feff> is displayed in a different color by vim, telling me > that this is some kind of special caracter code (I'm no familiar with > such thing). U+FEFF is a "byte order mark" or BOM. Each Unicode-based encoding (UTF-8, UTF-16, UTF-16-LE, etc) will encode it differently, so it enables a program reading the file to determine the encoding before reading any actual data. > My problem is however simplier : how do I add such character at the > begining of the file ? > I tried Either: 1. Open the file as binary and write '\xef\xbb\xbf' to the file: f = open('foo.txt', 'wb') f.write('\xef\xbb\xbf') [You can also use the constant BOM_UTF8 from the codecs module.] 2. Open the file as utf-8 and write u'\ufeff' to the file: import codecs f = codecs.open('foo.txt', 'w', 'utf-8') f.write(u'\ufeff') 3. Open the file as utf-8-sig and don't write anything (or write an empty string): import codecs f = codecs.open('foo.txt', 'w', 'utf-8-sig') f.write('') The utf-8-sig codec automatically writes a BOM at the beginning of the file. It is present in Python 2.5 and later.
|
Next
|
Last
Pages: 1 2 Prev: How do I get number of files in a particular directory. Next: Deditor -- pythonic text-editor |