Read utf-8 file return utf-16 coding hex string ? [Java Programming]

Prev: Need to recompile a Java Applet as an Executable
Next: Save xls file as blob to db ?

From: Lew on 30 Jan 2010 15:08

Lew wrote:
>> The term "UTF-8 data" has no meaning.

Arved Sandstrom wrote:
> That's a bit nitpicky for me. If you're going to get that precise then
> there's no such thing as character data either, since characters are
> also an interpretation of binary bytes and words. In this view there's
> no difference between a Unicode file and a PNG file and a PDF file and
> an ASCII file.
>
> Since we do routinely describe files by the only useful interpretation
> of them, why not UTF-8 data files?

You are right, generally, but the OP evinced an understanding of the term that
was interfering with his ability to accomplish his goal. I suggest that
thinking of the data as just "characters" and segregating the concept of the
encoding will help him.

Once he's got the hang of it, then, yeah, go ahead and call it "UTF-8 data".

--
Lew

From: markspace on 30 Jan 2010 14:06

moonhkt wrote:

> In Progress, viewed the inputted data by UTF-8 terminal as a 凌晨. So,
> we felt it is not awful to ISO8859-1 database. Actually, Database seem
> to be handle 0x00 to 0xFF characters. The number of byte for 凌晨 to be
> six byte.

Correct. You can't fit six bytes into one. You can't store all UTF-8
characters into an ISO8859-1 file. Some (most) will get truncated.

For a 10 year old database, it's time to upgrade. Go with UTF-8 (or
UTF-16).

From: bugbear on 1 Feb 2010 04:45

moonhkt wrote:

> Thank for documents for UTF-8. Actually, My company want using
> ISO8859-1 database to store UTF-8 data.

You have my sympathy.

BugBear

From: bugbear on 1 Feb 2010 04:47

markspace wrote:
> moonhkt wrote:
>
>> In Progress, viewed the inputted data by UTF-8 terminal as a 凌晨. So,
>> we felt it is not awful to ISO8859-1 database. Actually, Database seem
>> to be handle 0x00 to 0xFF characters. The number of byte for 凌晨 to be
>> six byte.
>
> Correct. You can't fit six bytes into one. You can't store all UTF-8
> characters into an ISO8859-1 file. Some (most) will get truncated.

But you can store 6 bytes as 6 Latin-1 chars (as long as
the DB doesn't suppress the "invalid" values; most don't)

It just won't have the right semantics.

BugBear

From: moonhkt on 3 Feb 2010 09:43

On Feb 1, 5:47Â pm, bugbear <bugbear(a)trim_papermule.co.uk_trim> wrote:
> markspace wrote:
> > moonhkt wrote:
>
> >> In Progress, viewed the inputted data by UTF-8 terminal as a åæ¨. So,
> >> we felt it is not awful to ISO8859-1 database. Actually, Database seem
> >> to be handle 0x00 to 0xFF characters. Â The number of byte for åæ¨ to be
> >> six byte.
>
> > Correct. Â You can't fit six bytes into one. Â You can't store all UTF-8
> > characters into an ISO8859-1 file. Â Some (most) will get truncated..
>
> But you can store 6 bytes as 6 Latin-1 chars (as long as
> the DB doesn't suppress the "invalid" values; most don't)
>
> It just won't have the right semantics.
>
> Â Â BugBear

What is your problem ?

The six bytes , 3 for first character and next 3 bytes for seconding
character.
Actually, We tried import and export , and compare two file are same.

The next task, is Extended ascii code, 80 to FF, value is not part of
UTF-8. It is means that the Output file can not include 80 to FF bytes
value ?
And handle 0xBC, Fraction one quarter, 0xBD,Fraction one half
conversion to UTF-8. or some value in Extended ASCII code to UTF-8
conversion.

Below Extended ASCII code found in our Database, ISO8859-1.
0x85
0xA9
0xAE

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Need to recompile a Java Applet as an Executable
Next: Save xls file as blob to db ?