From: John Machin on 27 Oct 2009 23:46 On Oct 28, 2:51 am, Ethan Furman <et...(a)stoneleaf.us> wrote: > John Machin wrote: > > On Oct 27, 7:15 am, Ethan Furman <et...(a)stoneleaf.us> wrote: > > >>Let me rephrase -- say I get a dbf file with an LDID of \x0f that maps > >>to a cp437, and the file came from a german oem machine... could that > >>file have upper-ascii codes that will not map to anything reasonable on > >>my \x01 cp437 machine? If so, is there anything I can do about it? > > > ASCII is defined over the first 128 codepoints; "upper-ascii codes" is > > meaningless. As for the rest of your question, if the file's encoded > > in cpXXX, it's encoded in cpXXX. If either the creator or the reader > > or both are lying, then all bets are off. > > My confusion is this -- is there a difference between any of the various > cp437s? What various cp437s??? > Going down the list at ESRI: 0x01, 0x09, 0x0b, 0x0d, 0x0f, > 0x11, 0x15, 0x18, 0x19, and 0x1b all map to cp437, Yes, this is called a "many-to-*one*" relationship. > and they have names "they" being the Language Drivers, not the codepages. > such as US, Dutch, Finnish, French, German, Italian, Swedish, Spanish, > English (Britain & US)... are these all the same? When you read the Wikipedia page on cp437, did you see any reference to different versions for French, German, Finnish, etc? I saw only one mapping table; how many did you see? If there are multiple language versions of a codepage, how do you expect to handle this given Python has only one codec per codepage? Trying again: *ONE* attribute of a Language Driver ID (LDID) is the character set (codepage) that it uses. Other attributes may be things like the collating (sorting) sequence, whether they use a dot or a comma as the decimal point, etc. Many different languages in Western Europe can use the same codepage. Initially the common one was cp 437, then 850, then 1252. There may possibly different interpretations of a codepage out there somewhere, but they are all *intended* to be the same, and I advise you to cross the different-cp437s bridge *if* it exists and you ever come to it. Have you got access to files with LDID not in (0, 1) that you can try out? Cheers, John
From: Ethan Furman on 28 Oct 2009 00:59 John Machin wrote: > There may possibly different interpretations of a codepage out there > somewhere, but they are all *intended* to be the same, and I advise > you to cross the different-cp437s bridge *if* it exists and you ever > come to it. > > Have you got access to files with LDID not in (0, 1) that you can try > out? Alas, I do not. And I probably never will, making the whole thing academic. Speaking of tables I do not have access to, and documentation for that matter, I would love to get information on db4, 5, 7, etc. Many thanks for your time and knowledge, and my apologies for seeming so dense. :) Cheers! ~Ethan~
First
|
Prev
|
Pages: 1 2 3 Prev: subprocess executing shell Next: IDLE python shell freezes after running show() of matplotlib |