From: Lew on
Lew wrote:
>> The term "UTF-8 data" has no meaning.

Arved Sandstrom wrote:
> That's a bit nitpicky for me. If you're going to get that precise then
> there's no such thing as character data either, since characters are
> also an interpretation of binary bytes and words. In this view there's
> no difference between a Unicode file and a PNG file and a PDF file and
> an ASCII file.
>
> Since we do routinely describe files by the only useful interpretation
> of them, why not UTF-8 data files?

You are right, generally, but the OP evinced an understanding of the term that
was interfering with his ability to accomplish his goal. I suggest that
thinking of the data as just "characters" and segregating the concept of the
encoding will help him.

Once he's got the hang of it, then, yeah, go ahead and call it "UTF-8 data".

--
Lew
From: markspace on
moonhkt wrote:

> In Progress, viewed the inputted data by UTF-8 terminal as a 凌晨. So,
> we felt it is not awful to ISO8859-1 database. Actually, Database seem
> to be handle 0x00 to 0xFF characters. The number of byte for 凌晨 to be
> six byte.

Correct. You can't fit six bytes into one. You can't store all UTF-8
characters into an ISO8859-1 file. Some (most) will get truncated.

For a 10 year old database, it's time to upgrade. Go with UTF-8 (or
UTF-16).

From: bugbear on
moonhkt wrote:

> Thank for documents for UTF-8. Actually, My company want using
> ISO8859-1 database to store UTF-8 data.

You have my sympathy.

BugBear
From: bugbear on
markspace wrote:
> moonhkt wrote:
>
>> In Progress, viewed the inputted data by UTF-8 terminal as a 凌晨. So,
>> we felt it is not awful to ISO8859-1 database. Actually, Database seem
>> to be handle 0x00 to 0xFF characters. The number of byte for 凌晨 to be
>> six byte.
>
> Correct. You can't fit six bytes into one. You can't store all UTF-8
> characters into an ISO8859-1 file. Some (most) will get truncated.

But you can store 6 bytes as 6 Latin-1 chars (as long as
the DB doesn't suppress the "invalid" values; most don't)

It just won't have the right semantics.

BugBear
From: moonhkt on
On Feb 1, 5:47 pm, bugbear <bugbear(a)trim_papermule.co.uk_trim> wrote:
> markspace wrote:
> > moonhkt wrote:
>
> >> In Progress, viewed the inputted data by UTF-8 terminal as a 凌晨. So,
> >> we felt it is not awful to ISO8859-1 database. Actually, Database seem
> >> to be handle 0x00 to 0xFF characters.  The number of byte for 凌晨 to be
> >> six byte.
>
> > Correct.  You can't fit six bytes into one.  You can't store all UTF-8
> > characters into an ISO8859-1 file.  Some (most) will get truncated..
>
> But you can store 6 bytes as 6 Latin-1 chars (as long as
> the DB doesn't suppress the "invalid" values; most don't)
>
> It just won't have the right semantics.
>
>    BugBear

What is your problem ?

The six bytes , 3 for first character and next 3 bytes for seconding
character.
Actually, We tried import and export , and compare two file are same.

The next task, is Extended ascii code, 80 to FF, value is not part of
UTF-8. It is means that the Output file can not include 80 to FF bytes
value ?
And handle 0xBC, Fraction one quarter, 0xBD,Fraction one half
conversion to UTF-8. or some value in Extended ASCII code to UTF-8
conversion.

Below Extended ASCII code found in our Database, ISO8859-1.
0x85
0xA9
0xAE