From: Lew on 30 Jan 2010 15:08 Lew wrote: >> The term "UTF-8 data" has no meaning. Arved Sandstrom wrote: > That's a bit nitpicky for me. If you're going to get that precise then > there's no such thing as character data either, since characters are > also an interpretation of binary bytes and words. In this view there's > no difference between a Unicode file and a PNG file and a PDF file and > an ASCII file. > > Since we do routinely describe files by the only useful interpretation > of them, why not UTF-8 data files? You are right, generally, but the OP evinced an understanding of the term that was interfering with his ability to accomplish his goal. I suggest that thinking of the data as just "characters" and segregating the concept of the encoding will help him. Once he's got the hang of it, then, yeah, go ahead and call it "UTF-8 data". -- Lew
From: markspace on 30 Jan 2010 14:06 moonhkt wrote: > In Progress, viewed the inputted data by UTF-8 terminal as a 凌晨. So, > we felt it is not awful to ISO8859-1 database. Actually, Database seem > to be handle 0x00 to 0xFF characters. The number of byte for 凌晨 to be > six byte. Correct. You can't fit six bytes into one. You can't store all UTF-8 characters into an ISO8859-1 file. Some (most) will get truncated. For a 10 year old database, it's time to upgrade. Go with UTF-8 (or UTF-16).
From: bugbear on 1 Feb 2010 04:45 moonhkt wrote: > Thank for documents for UTF-8. Actually, My company want using > ISO8859-1 database to store UTF-8 data. You have my sympathy. BugBear
From: bugbear on 1 Feb 2010 04:47 markspace wrote: > moonhkt wrote: > >> In Progress, viewed the inputted data by UTF-8 terminal as a 凌晨. So, >> we felt it is not awful to ISO8859-1 database. Actually, Database seem >> to be handle 0x00 to 0xFF characters. The number of byte for 凌晨 to be >> six byte. > > Correct. You can't fit six bytes into one. You can't store all UTF-8 > characters into an ISO8859-1 file. Some (most) will get truncated. But you can store 6 bytes as 6 Latin-1 chars (as long as the DB doesn't suppress the "invalid" values; most don't) It just won't have the right semantics. BugBear
From: moonhkt on 3 Feb 2010 09:43 On Feb 1, 5:47 pm, bugbear <bugbear(a)trim_papermule.co.uk_trim> wrote: > markspace wrote: > > moonhkt wrote: > > >> In Progress, viewed the inputted data by UTF-8 terminal as a åæ¨. So, > >> we felt it is not awful to ISO8859-1 database. Actually, Database seem > >> to be handle 0x00 to 0xFF characters.  The number of byte for åæ¨ to be > >> six byte. > > > Correct.  You can't fit six bytes into one.  You can't store all UTF-8 > > characters into an ISO8859-1 file.  Some (most) will get truncated.. > > But you can store 6 bytes as 6 Latin-1 chars (as long as > the DB doesn't suppress the "invalid" values; most don't) > > It just won't have the right semantics. > >   BugBear What is your problem ? The six bytes , 3 for first character and next 3 bytes for seconding character. Actually, We tried import and export , and compare two file are same. The next task, is Extended ascii code, 80 to FF, value is not part of UTF-8. It is means that the Output file can not include 80 to FF bytes value ? And handle 0xBC, Fraction one quarter, 0xBD,Fraction one half conversion to UTF-8. or some value in Extended ASCII code to UTF-8 conversion. Below Extended ASCII code found in our Database, ISO8859-1. 0x85 0xA9 0xAE
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: Need to recompile a Java Applet as an Executable Next: Save xls file as blob to db ? |