change ISO8859-1 to GB2312 [Java Programming]

Prev: Basic Question re JUnit Tests and Deprecated Methods
Next: entity bean as being reentrant

From: moonhkt on 19 May 2010 02:40

Hi All

Our database codepage is iso8859-1. Some data input with GB2312 data.
When export data to iso8859-1 format with GB2312 data, Is it possible
to change iso8859-1 to GB2312 format ?

Machine AIX.

I try below coding not work.

import java.nio.charset.Charset ;
import java.io.*;
import java.lang.String;
public class read_iso {
public static void main(String[] args) {
File aFile = new File("abc.txt");
try {
String str = "";
BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(aFile),
"iso8859-1"));

while (( str = in.readLine()) != null )
{
System.out.println(str);
System.out.println(new String (str.getBytes("iso8859-1")));
System.out.println(new String
(str.getBytes("iso-8859-1"),"GB2312")); /* not */
}
} catch (UnsupportedEncodingException e) {
} catch (IOException e) {
}

}
}

From: Lew on 19 May 2010 12:50

On 05/19/2010 02:40 AM, moonhkt wrote:
> Our database codepage is iso8859-1. Some data input with GB2312 data.
> When export data to iso8859-1 format with GB2312 data, Is it possible
> to change iso8859-1 to GB2312 format ?
>
> Machine AIX.
>
>
> I try below coding not work.
>
> import java.nio.charset.Charset ;
> import java.io.*;
> import java.lang.String;
> public class read_iso {

You should follow the Java naming conventions.

> public static void main(String[] args) {
> File aFile = new File("abc.txt");
> try {

.... and indentation conventions.

> String str = "";

And not initialize to values that are never used, only discarded.

> BufferedReader in = new BufferedReader(
> new InputStreamReader(new FileInputStream(aFile),
> "iso8859-1"));
>
> while (( str = in.readLine()) != null )
> {
> System.out.println(str);
> System.out.println(new String (str.getBytes("iso8859-1")));

Didn't you say the data was input in GB2312 encoding?

Whatever, this constructs a string using the platform native encoding from
bytes encoded using ISO-8859-1. If that isn't the native encoding, you got
worries.

> System.out.println(new String
> (str.getBytes("iso-8859-1"),"GB2312")); /* not */

Now you're decoding bytes using GB2312 from bytes encoded using ISO-8859-1.
That can't work.

System.out always uses the platform default string encoding.

> }
> } catch (UnsupportedEncodingException e) {
> } catch (IOException e) {
> }

Don't silently eat exceptions.

> }
> }

My approach to the encoding would be a lot more straightforward. None of this
wacky "new String()" stuff.

<sscce source="eegee/FooCoder.java">
package eegee;

import java.io.*;
import org.apache.log4j.Logger;
import static org.apache.log4j.Logger.getLogger;

public class FooCoder
{
private transient final Logger logger = getLogger( FooCoder.class );

public static void main( String[] args )
{
new FooCoder().recode();
}

public void recode()
{
final BufferedReader rin;
final BufferedWriter owt;
try
{
rin = new BufferedReader( new InputStreamReader(
getClass().getResourceAsStream( "temp.txt" ),
"ISO-8859-1" ));
owt = new BufferedWriter( new OutputStreamWriter(
System.out, "GB2312" ));
}
catch ( IOException exc )
{
logger.error( exc );
return;
}
try
{
for ( String str; (str = rin.readLine()) != null; )
{
owt.write( str );
owt.newLine();
}
owt.flush();
}
catch ( IOException exc )
{
logger.error( exc );
}
finally
{
try
{
rin.close();
owt.close();
}
catch ( IOException exc )
{
logger.error( exc );
}
}
}
}
</sscce>

--
Lew

From: moonhkt on 19 May 2010 22:12

On 5æ20æ¥, ä¸å12æ¶50å, Lew <no....(a)lewscanon.com> wrote:
> On 05/19/2010 02:40 AM, moonhkt wrote:
>
> > Our database codepage is iso8859-1. Some data input with GB2312 data.
> > When export data to iso8859-1 format with GB2312 data, Is it possible
> > to change iso8859-1 to GB2312 format ?
>
> > Machine AIX.
>
> > I try below coding not work.
>
> > import java.nio.charset.Charset ;
> > import java.io.*;
> > import java.lang.String;
> > public class Â read_iso {
>
> You should follow the Java naming conventions.
>
> > public static void main(String[] args) {
> > File aFile = new File("abc.txt");
> > try {
>
> ... and indentation conventions.
>
> > Â Â Â String str = "";
>
> And not initialize to values that are never used, only discarded.
>
> > Â Â Â BufferedReader in = new BufferedReader(
> > Â Â Â Â Â new InputStreamReader(new FileInputStream(aFile),
> > "iso8859-1"));
>
> > Â Â while (( str = in.readLine()) != null )
> > Â Â {
> > Â Â Â Â System.out.println(str);
> > Â Â Â Â System.out.println(new String (str.getBytes("iso8859-1")));
>
> Didn't you say the data was input in GB2312 encoding?
>
> Whatever, this constructs a string using the platform native encoding from
> bytes encoded using ISO-8859-1. Â If that isn't the native encoding, you got
> worries.
>
> > Â Â Â Â System.out.println(new String
> > (str.getBytes("iso-8859-1"),"GB2312")); Â /* not */
>
> Now you're decoding bytes using GB2312 from bytes encoded using ISO-8859-1.
> That can't work.
>
> System.out always uses the platform default string encoding.
>
> > Â Â }
> > } catch (UnsupportedEncodingException e) {
> > } catch (IOException e) {
> > }
>
> Don't silently eat exceptions.
>
> > }
> > }
>
> My approach to the encoding would be a lot more straightforward. Â None of this
> wacky "new String()" stuff.
>
> <sscce source="eegee/FooCoder.java">
> Â package eegee;
>
> Â import java.io.*;
> Â import org.apache.log4j.Logger;
> Â import static org.apache.log4j.Logger.getLogger;
>
> Â public class FooCoder
> Â {
> Â Â private transient final Logger logger = getLogger( FooCoder.class );
>
> Â Â public static void main( String[] args )
> Â Â {
> Â Â Â new FooCoder().recode();
> Â Â }
>
> Â Â public void recode()
> Â Â {
> Â Â Â final BufferedReader rin;
> Â Â Â final BufferedWriter owt;
> Â Â Â try
> Â Â Â {
> Â Â Â Â rin = new BufferedReader( new InputStreamReader(
> Â Â Â Â Â getClass().getResourceAsStream( "temp.txt" ),
> Â Â Â Â Â "ISO-8859-1" ));
> Â Â Â Â owt = new BufferedWriter( new OutputStreamWriter(
> Â Â Â Â Â System.out, "GB2312" ));
> Â Â Â }
> Â Â Â catch ( IOException exc )
> Â Â Â {
> Â Â Â Â logger.error( exc );
> Â Â Â Â return;
> Â Â Â }
> Â Â Â try
> Â Â Â {
> Â Â Â Â for ( String str; (str = rin.readLine()) != null; )
> Â Â Â Â {
> Â Â Â Â Â owt.write( str );
> Â Â Â Â Â owt.newLine();
> Â Â Â Â }
> Â Â Â Â owt.flush();
> Â Â Â }
> Â Â Â catch ( IOException exc )
> Â Â Â {
> Â Â Â Â logger.error( exc );
> Â Â Â }
> Â Â Â finally
> Â Â Â {
> Â Â Â Â try
> Â Â Â Â {
> Â Â Â Â Â rin.close();
> Â Â Â Â Â owt.close();
> Â Â Â Â }
> Â Â Â Â catch ( IOException exc )
> Â Â Â Â {
> Â Â Â Â Â logger.error( exc );
> Â Â Â Â }
> Â Â Â }
> Â }}
>
> </sscce>
>
> --
> Lew

Hi Lew
Thank a lot.
How to check platform native encoding ?

Change your code as below. My test file can conv to UTF-8, view in
Reflection UTF-8 Emulation, the font is ok.
View in IE the font is ok.

temp.txt file
| 10 TEST1 |æµè¯1
| |
| 11 TEST2 |æµè¯2
| |
| 12 TEST3 |æµè¯3
| |
| 13 TEST4 |æµè¯4
| |
| 14 TEST5 |æµè¯5
| |

import java.io.*;
public class conv_ig
{
public static void main( String[] args )
{
new conv_ig().recode();
}
public void recode()
{
final BufferedReader rin;
final BufferedWriter owt;
try
{
rin = new BufferedReader( new InputStreamReader(
/* getClass().getResourceAsStream( "temp.txt" ),
"ISO-8859-1" ));
owt = new BufferedWriter( new OutputStreamWriter(System.out,
"GB2312" ));
*/
getClass().getResourceAsStream( "temp.txt" ),"GB2312" ));
owt = new BufferedWriter( new OutputStreamWriter(
System.out, "UTF-8" ));
}
catch ( IOException exc )
{
/* logger.error( exc ); */
return;
}
try
{
for ( String str; (str = rin.readLine()) != null; )
{
owt.write( str );
owt.newLine();
}
owt.flush();
}
catch ( IOException exc )
{
/* logger.error( exc ); */
}
finally
{
try
{
rin.close();
owt.close();
}
catch ( IOException exc )
{
/* logger.error( exc ); */
}
}
}
}

From: Lew on 19 May 2010 23:58

moonhkt wrote:
> Change your code as below. My test file can conv to UTF-8, view in
> Reflection UTF-8 Emulation, the font is ok.

What is "Reflection UTF-8"?

Not a bad job there, but I have to wonder why you ruined the indentation and
still are flouting the naming conventions. Code should be readable.

Also, it is exceedingly bad that you eliminated logging. You should keep the
logging. Switch to java.util.logging if you don't like log4j or don't care to
add the JAR, but for Pete's sake keep the logging. Yikes.

Here's a pop quiz for you - given that few code examples I've seen use the
idiom I did of a separate try block for opening the Reader and Writer from the
one for using them, why do you think I bothered?

Is it better or worse than the common idiom, or simply a matter of style and
more power to you for whichever?

> View in IE the font is ok.
>
> temp.txt file
> | 10 TEST1 |测试1
> | |
> | 11 TEST2 |测试2
> | |
> | 12 TEST3 |测试3
> | |
> | 13 TEST4 |测试4
> | |
> | 14 TEST5 |测试5
> | |
>
>
> import java.io.*;
> public class conv_ig
> {
> public static void main( String[] args )
> {
> new conv_ig().recode();
> }
> public void recode()
> {
> final BufferedReader rin;
> final BufferedWriter owt;
> try
> {
> rin = new BufferedReader( new InputStreamReader(
> /* getClass().getResourceAsStream( "temp.txt" ),
> "ISO-8859-1" ));
> owt = new BufferedWriter( new OutputStreamWriter(System.out,
> "GB2312" ));
> */
> getClass().getResourceAsStream( "temp.txt" ),"GB2312" ));
> owt = new BufferedWriter( new OutputStreamWriter(
> System.out, "UTF-8" ));
> }
> catch ( IOException exc )
> {
> /* logger.error( exc ); */
> return;
> }
> try
> {
> for ( String str; (str = rin.readLine()) != null; )
> {
> owt.write( str );
> owt.newLine();
> }
> owt.flush();
> }
> catch ( IOException exc )
> {
> /* logger.error( exc ); */
> }
> finally
> {
> try
> {
> rin.close();
> owt.close();
> }
> catch ( IOException exc )
> {
> /* logger.error( exc ); */
> }
> }
> }
> }

--
Lew

From: moonhkt on 20 May 2010 22:18

On 5æ20æ¥, ä¸å11æ58å, Lew <no....(a)lewscanon.com> wrote:
> moonhkt wrote:
> > Change your code as below. My test file can conv to UTF-8, view in
> > Reflection UTF-8 Emulation, the font is ok.
>
> What is "Reflection UTF-8"?
>
> Not a bad job there, but I have to wonder why you ruined the indentation and
> still are flouting the naming conventions. Â Code should be readable.
>
> Also, it is exceedingly bad that you eliminated logging. Â You should keep the
> logging. Â Switch to java.util.logging if you don't like log4j or don't care to
> add the JAR, but for Pete's sake keep the logging. Â Yikes.
>
> Here's a pop quiz for you - given that few code examples I've seen use the
> idiom I did of a separate try block for opening the Reader and Writer from the
> one for using them, why do you think I bothered?
>
> Is it better or worse than the common idiom, or simply a matter of style and
> more power to you for whichever?
>
>
>
> > View in IE the font is ok.
>
> > temp.txt file
> > | 10 TEST1 Â Â |æµè¯1
> > | Â Â Â Â Â Â Â Â Â Â Â Â |
> > | 11 TEST2 Â Â |æµè¯2
> > | Â Â Â Â Â Â Â Â Â Â Â Â |
> > | 12 TEST3 Â Â |æµè¯3
> > | Â Â Â Â Â Â Â Â Â Â Â Â |
> > | 13 TEST4 Â Â |æµè¯4
> > | Â Â Â Â Â Â Â Â Â Â Â Â |
> > | 14 TEST5 Â Â |æµè¯5
> > | Â Â Â Â Â Â Â Â Â Â Â Â |
>
> > import java.io.*;
> > public class conv_ig
> > {
> > Â Â Â public static void main( String[] args )
> > Â Â Â {
> > Â Â Â new conv_ig().recode();
> > Â Â Â }
> > Â Â Â public void recode()
> > {
> > Â Â final BufferedReader rin;
> > Â Â Â final BufferedWriter owt;
> > Â Â Â try
> > Â Â Â {
> > Â Â Â Â rin = new BufferedReader( new InputStreamReader(
> > Â Â Â Â Â /* getClass().getResourceAsStream( "temp.txt" ),
> > Â Â Â Â Â "ISO-8859-1" ));
> > Â Â Â Â Â owt = new BufferedWriter( new OutputStreamWriter(System.out,
> > "GB2312" ));
> > Â Â Â Â Â */
> > Â Â Â Â getClass().getResourceAsStream( "temp.txt" ),"GB2312" ));
> > Â Â Â Â owt = new BufferedWriter( new OutputStreamWriter(
> > Â Â Â Â Â System.out, "UTF-8" ));
> > Â Â Â }
> > Â Â Â catch ( IOException exc )
> > Â Â Â {
> > Â Â Â Â /* logger.error( exc ); Â */
> > Â Â Â Â return;
> > Â Â Â }
> > Â Â Â try
> > Â Â Â {
> > Â Â Â Â for ( String str; (str = rin.readLine()) != null; )
> > Â Â Â Â {
> > Â Â Â Â Â owt.write( str );
> > Â Â Â Â Â owt.newLine();
> > Â Â Â Â }
> > Â Â Â Â owt.flush();
> > Â Â Â }
> > Â Â Â catch ( IOException exc )
> > Â Â Â {
> > Â Â Â Â /* logger.error( exc ); Â */
> > Â Â Â }
> > Â Â Â finally
> > Â Â Â {
> > Â Â Â Â try
> > Â Â Â Â {
> > Â Â Â Â Â rin.close();
> > Â Â Â Â Â owt.close();
> > Â Â Â Â }
> > Â Â Â Â catch ( IOException exc )
> > Â Â Â Â {
> > Â Â Â Â Â /* logger.error( exc ); Â */
> > Â Â Â Â }
> > Â Â Â }
> > }
> > }
>
> --
> Lew

Sorry about this. This is dirty method to test the code. Reflection
is Telnet software using UTF-8 Emulation to check the the string
encoding.
I will check How to using java.util.logging .

Can you give some example where "ruined the indentation " ? and what
about the the naming conventions ?

| Next | Last
Pages: 1 2 3
Prev: Basic Question re JUnit Tests and Deprecated Methods
Next: entity bean as being reentrant