From: Terry Reedy on 13 Aug 2010 18:25 A short background to MRAB's answer which I will try to get right. The byte-order-mark was invented for UTF-16 encodings so the reader could determine whether the pairs of bytes are in little or big endiean order, depending on whether the first two bute are fe and ff or ff and fe (or maybe vice versa, does not matter here). The concept is meaningless for utf-8 which consists only of bytes in a defined order. This is part of the Unicode standard. However, Microsoft (or whoever) re-purposed (hijacked) that pair of bytes to serve as a non-standard indicator of utf-8 versus any non-unicode encoding. The result is a corrupted utf-8 stream that python accommodates with the utf-8-sig(nature) codec (versus the standard utf-8 codec). -- Terry Jan Reedy
First
|
Prev
|
Pages: 1 2 Prev: How do I get number of files in a particular directory. Next: Deditor -- pythonic text-editor |