change ISO8859-1 to GB2312 [Java Programming]

Prev: Basic Question re JUnit Tests and Deprecated Methods
Next: entity bean as being reentrant

From: Lew on 21 May 2010 01:03

moonhkt wrote:
>>> public class conv_ig
>>> {
>>> public static void main( String[] args )
>>> {
>>> new conv_ig().recode();
>>> }
>>> public void recode()
>>> {
....
>> --
>> Lew

Please do not quote sigs.

> Sorry about this. This is dirty method to test the code. Reflection
> is Telnet software using UTF-8 Emulation to check the the string
> encoding.

Oh, THAT Reflection.

> I will check How to using java.util.logging .
>
> Can you give some example where "ruined the indentation " ? and what
> about the the naming conventions ?

I apologize about the indentation comment - apparently I was seeing an
artifact of word wrap imposed by the posting software and not something that
you did.

As for the naming conventions:
<http://java.sun.com/docs/codeconv/index.html>

You named the class:
>>> public class conv_ig

The convention is to name a class with an initial upper-case letter and camel
case (mixed case, first letter of each word within the compound capitalized
and the rest lower-case), as explained in the Java Code Conventions document.

Methods and non-constant variables (or, more conventionally, non-final
variables) begin with a lower-case letter and are otherwise in camel case.

Underscores should only be used in names that comprise all upper-case letters,
namely those of constant (or more conventionally, final) member variables.

--
Lew

From: RedGrittyBrick on 21 May 2010 04:35

On 21/05/2010 03:18, moonhkt wrote:
> On 5月20日, 上午11時58分, Lew<no...(a)lewscanon.com> wrote:
>> moonhkt wrote:
>>> Change your code as below. My test file can conv to UTF-8, view in
>>> Reflection UTF-8 Emulation, the font is ok.
>>
>> What is "Reflection UTF-8"?
>
> Sorry about this. This is dirty method to test the code. Reflection
> is Telnet software using UTF-8 Emulation to check the the string
> encoding.

There's much wrong in the above.

Reflection is a *terminal-emulator* marketed by Attachmate (who
presumably absorbed WRQ, it's original developers).

Reflection does not *emulate* UTF-8, Reflection handles several
character encodings amongst which is UTF-8. Reflection doesn't *check*
the encoding (AFAIK), it just *uses* the configured encoding to
determine which glyph to display for a received byte sequence.

What Reflection *does* emulate is a variety of serial character-mode
terminals such as VT220, Wyse-50 and varieties of ANSI "standard" terminals.

Telnet is only one of several application layers supported by Reflection
for host communication, though I suppose it is the principal one. FTP
and SSH are others.

--
RGB

From: moonhkt on 21 May 2010 12:38

On 5æ21æ¥, ä¸å4æ35å, RedGrittyBrick <RedGrittyBr...(a)spamweary.invalid>
wrote:
> On 21/05/2010 03:18, moonhkt wrote:
>
> > On 5æ20æ¥, ä¸å11æ58å, Lew<no...(a)lewscanon.com> Â wrote:
> >> moonhkt wrote:
> >>> Change your code as below. My test file can conv to UTF-8, view in
> >>> Reflection UTF-8 Emulation, the font is ok.
>
> >> What is "Reflection UTF-8"?
>
> > Sorry about this. This is dirty method to test the code. Â Reflection
> > is Telnet software using UTF-8 Emulation to check the the string
> > encoding.
>
> There's much wrong in the above.
>
> Reflection is a *terminal-emulator* marketed by Attachmate (who
> presumably absorbed WRQ, it's original developers).
>
> Reflection does not *emulate* UTF-8, Reflection handles several
> character encodings amongst which is UTF-8. Reflection doesn't *check*
> the encoding (AFAIK), it just *uses* the configured encoding to
> determine which glyph to display for a received byte sequence.
>
> What Reflection *does* emulate is a variety of serial character-mode
> terminals such as VT220, Wyse-50 and varieties of ANSI "standard" terminals.
>
> Telnet is only one of several application layers supported by Reflection
> for host communication, though I suppose it is the principal one. FTP
> and SSH are others.
>
> --
> RGB

Hi All
Thank for explain how reflection works.

Our database is ISO8859-1 format with some GB2312 and other non
ISO8859-1 data. Now, we want print GB2312 code in work order routing.
We planing to purchase a Chinese line printer for printing GB2312. The
line printer can print the file under UNIX. Why the output file no
need to convert GB2312 format before printing ?
Any Suggestion ? And Java Conversion program can convert my output to
UTF-8.

moonhkt

From: RedGrittyBrick on 21 May 2010 18:23

On 21/05/2010 17:38, moonhkt wrote:
>
> Our database is ISO8859-1 format with some GB2312 and other non
> ISO8859-1 data. Now, we want print GB2312 code in work order routing.
> We planing to purchase a Chinese line printer for printing GB2312. The
> line printer can print the file under UNIX. Why the output file no
> need to convert GB2312 format before printing ?

You don't provide any details so I can only guess. My guess is that the
Database thinks it has (for example) six European letters when in fact
it has three Chinese characters. The database is happy to store and
retrieve the bytes sequences that would, under 8859-1 encoding represent
six European letters. When the retrieved byte sequences are sent to the
printer, because the printer is configured to use the GB2312 encoding,
it interprets those same byte sequences, not as six European letters but
as three Chinese characters.

On the other hand, so far as I know, Unix/Linux printing systems like
CUPS allow you to specify a character encoding as an option to commands
like lp. they also pick them up from the locale (see environment
variables) This allows CUPS to do whatever is needed to print those
characters correctly.

> Any Suggestion ? And Java Conversion program can convert my output to
> UTF-8.

I'm sure it can. If a Java program knows what encodings are to be used
for data input and data output then the standard classes allow you to
handle data correctly*. How that would help in your situation I don't
know. if your database thinks it is handing 8859-1 encoded European
characters to your Java program when in fact some of that needs to be
interpreted as GB3212 then I expect you will have to do something ugly
in Java. UTF-8 is, in general, a good thing. Configuring your database,
your programs, your locale and your printer for UTF-8 might well be a
good thing to do.

--
RGB

From: moonhkt on 23 May 2010 22:30

On 5æ22æ¥, ä¸å6æ¶23å, RedGrittyBrick <RedGrittyBr...(a)SpamWeary.invalid>
wrote:
> On 21/05/2010 17:38, moonhkt wrote:
>
>
>
> > Our database is ISO8859-1 format with some GB2312 and other non
> > ISO8859-1 data. Now, we want print GB2312 code in work order routing.
> > We planing to purchase a Chinese line printer for printing GB2312. The
> > line printer can print the file under UNIX. Why the output file no
> > need to convert GB2312 format before printing ?
>
> You don't provide any details so I can only guess. My guess is that the
> Database thinks it has (for example) six European letters when in fact
> it has three Chinese characters. The database is happy to store and
> retrieve the bytes sequences that would, under 8859-1 encoding represent
> six European letters. When the retrieved byte sequences are sent to the
> printer, because the printer is configured to use the GB2312 encoding,
> it interprets those same byte sequences, not as six European letters but
> as three Chinese characters.
>
> On the other hand, so far as I know, Unix/Linux printing systems like
> CUPS allow you to specify a character encoding as an option to commands
> like lp. they also pick them up from the locale (see environment
> variables) This allows CUPS to do whatever is needed to print those
> characters correctly.
>
> > Any Suggestion ? And Java Conversion program can convert my output to
> > UTF-8.
>
> I'm sure it can. If a Java program knows what encodings are to be used
> for data input and data output then the standard classes allow you to
> handle data correctly*. How that would help in your situation I don't
> know. if your database thinks it is handing 8859-1 encoded European
> characters to your Java program when in fact some of that needs to be
> interpreted as GB3212 then I expect you will have to do something ugly
> in Java. UTF-8 is, in general, a good thing. Configuring your database,
> your programs, your locale and your printer for UTF-8 might well be a
> good thing to do.
>
> --
> RGB

Hi All
Today, Our printer vendor suggest us provide Hanzi EBCDIC code for
testing Chinease printing.
Due to IBM Hosts All support Hanzi EBCDIC code.
How to Convert GB2312/UTF-8 to EBCDID

I try cp1047 on cp1838, All ASCII code like before. By compare using
diff to check the different.

First | Prev | Next | Last
Pages: 1 2 3
Prev: Basic Question re JUnit Tests and Deprecated Methods
Next: entity bean as being reentrant