From: François R on
I redirect system.out to a JTextArea with the following class

private class TextAreaOutputStream extends OutputStream {
JTextArea textArea;
TextAreaOutputStream(JTextArea textArea) {
this.textArea = textArea;
}
public void flush() {
textArea.repaint();
}
public void write(int b) {
//try {
textArea.append(new String(new byte[] {(byte)b}));
// } catch (UnsupportedEncodingException e){e.printStackTrace();}
}
}

and I use the class with
JTextArea msg = new JTextArea();
System.setOut(new PrintStream(new TextAreaOutputStream(msg), true));

This works well except when I have a character like Č (latin capital
letter C with caron, '\u010C') in a string, which is displayed as ? in
the text area whereas
msg.append(string); would be ok.
How could I correct the code above to have such a letter well
formed ?

Thanks
François
From: Mayeul on
François R wrote:
> This works well except when I have a character like Č (latin capital
> letter C with caron, '\u010C') in a string, which is displayed as ? in
> the text area whereas
> msg.append(string); would be ok.
> How could I correct the code above to have such a letter well
> formed ?

You have a character encoding problem.

Both the constructors PrintStream(OutputStream,boolean) and
String(byte[]) assume you're using your platform's default character
encoding to translate chars to bytes and vice-versa.

I expect your platform's default character to _not_ handle characters
such as U+10C, hence them being replaced with question marks.

The fix is to specify a character encoding to use, a unicode one, for
instance utf-8.


You can do that by constructing your PrintStream this way:

new PrintStream(new TextAreaOutputStream(msg), true, "utf-8")

And implementing your TextAreaOutputStream differently : it should store
the bytes in a buffer and wait til the OutputStream is flushed, thus
probably aligned after a character's final byte, then transform the
bytes received into a String and update the TextArea with it.

This could be done by writing the bytes you receive to a
ByteArrayOutputStream, and whenever it is flushed, fetch the byte[] and
build a String with it as such:

new String(bytes, "utf-8")


Note: one may think that using utf-16 instead of utf-8 would guarantee a
character to be 2-bytes and thus the solution easier to implement.
Except that *really* special characters (higher-than-U+FFFF characters)
still are be 4-bytes instead of 2-bytes with utf-16.
ucs-4 may work better if well-supported, I'm not sure.

--
Mayeul
From: Roedy Green on
On Wed, 4 Nov 2009 01:08:55 -0800 (PST), Fran�ois R
<rappazf(a)gmail.com> wrote, quoted or indirectly quoted someone who
said :

>
>This works well except when I have a character like ? (latin capital
>letter C with caron, '\u010C') in a string, which is displayed as ? in
>the text area whereas
>msg.append(string); would be ok.
>How could I

The way I would do it is direct the output to a file using UTF-8
encoding, or at least an encoding that supports the letters you need.
Then view it in some sort of viewer/editor that understands encodings.

See http://mindprod.com/applet/fileio.html
for the code to set up a PrintWriter to a file.


--
Roedy Green Canadian Mind Products
http://mindprod.com

An example (complete and annotated) is worth 1000 lines of BNF.
From: François R on
On Nov 4, 1:25 pm, Mayeul <mayeul.marg...(a)free.fr> wrote:
> François R wrote:
> > This works well except when I have a character like Č (latin capital
> > letter C with caron, '\u010C') in a string, which is displayed as ? in
> > the text area whereas
> > msg.append(string); would be ok.
> > How could I correct the code above to have such a letter well
> > formed ?
>
> You have a character encoding problem.
>
> Both the constructors PrintStream(OutputStream,boolean) and
> String(byte[]) assume you're using your platform's default character
> encoding to translate chars to bytes and vice-versa.
>
> I expect your platform's default character to _not_ handle characters
> such as U+10C, hence them being replaced with question marks.
>
> The fix is to specify a character encoding to use, a unicode one, for
> instance utf-8.
>
> You can do that by constructing your PrintStream this way:
>
> new PrintStream(new TextAreaOutputStream(msg), true, "utf-8")
>
> And implementing your TextAreaOutputStream differently : it should store
> the bytes in a buffer and wait til the OutputStream is flushed, thus
> probably aligned after a character's final byte, then transform the
> bytes received into a String and update the TextArea with it.
>
> This could be done by writing the bytes you receive to a
> ByteArrayOutputStream, and whenever it is flushed, fetch the byte[] and
> build a String with it as such:
>
> new String(bytes, "utf-8")
>
> Note: one may think that using utf-16 instead of utf-8 would guarantee a
> character to be 2-bytes and thus the solution easier to implement.
> Except that *really* special characters (higher-than-U+FFFF characters)
> still are be 4-bytes instead of 2-bytes with utf-16.
> ucs-4 may work better if well-supported, I'm not sure.
>
> --
> Mayeul

Thanks a lot for the suggestion !
I tried this:
try {
System.setOut(new PrintStream(new TextAreaOutputStream(msg), true,
"utf-8"));
} catch ....

and

private class TextAreaOutputStream extends OutputStream {
JTextArea textArea;
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
TextAreaOutputStream(JTextArea textArea) {
this.textArea = textArea;
}

public void flush() {
//textArea.repaint();
try {
textArea.append(buffer.toString("utf-8"));
buffer.reset();
} catch (UnsupportedEncodingException e){e.printStackTrace();}
}
public void write(int b) {
buffer.write(b);
//try {
//textArea.append(new String(new byte[] {(byte)b}));
// } catch (UnsupportedEncodingException e){e.printStackTrace();}
}

}

And it works well as it seems, with name like Cížek or Čížek properly
displayed.

François