Prev: split UTF-8 string to multi UTF8-file
Next: How to get an include-path of jni.h that is able to bedifferent on different platforms.
From: moonhkt on 27 Jan 2010 20:00 Hi All I want output the Character in the string one by one. Now,codePointAt just print the Code points value. On 1æ28æ¥, ä¸å12æ12å, RedGrittyBrick <RedGrittyBr...(a)spamweary.invalid> wrote: > moonhkt wrote: > > On Jan 27, 8:17 pm, Lothar Kimmeringer <news200...(a)kimmeringer.de> > > wrote: > >> moonhkt wrote: > >>> Below not work. > >> [...] > > >>>   char[] ch = new char[]; > >> Because it doesn't compile. > > >> What exactly doesn't work. Do you get a wrong output, do you > >> get an exception (you ignore in the source you provided). A > >> bit more information would really help to be able to answer > >> more than "something will be wrong in your code". > > >> Regards, Lothar > >> -- > >> Lothar Kimmeringer         E-Mail: spamf...(a)kimmeringer.de > >>         PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81) > > >> Always remember: The answer is forty-two, there can only be wrong > >>          questions! > > > Thank. I get below Example. But I can not get the UTF-8 char code. > > What do you mean by "UTF-8 char code"? Strictly speaking there is no > such thing. You might mean "Unicode code-point" or "sequence of octets > in UTF8-encoding" > > > > > > > > > class CodePointAtstring > > { > >  public static void main(String[] args) > >  { > >   // Declaration of String > >   String a="\u00fc" + "\u34d7"+ "Welcome to Rose india"; > >   //Displays the Actual String declared above > >   System.out.println("GIVEN STRING IS="+a); > >   //  Returns the character (Unicode code point) at the specified > > index. > >   System.out.println("Unicode code point at position 0 IN THE STRING > > IS="+a.codePointAt(0)); > >   System.out.println("Unicode code point at position 1 IN THE STRING > > IS="+a.codePointAt(1)); > >   System.out.println("Unicode code point at position 2 IN THE STRING > > IS="+a.codePointAt(2)); > >   System.out.println("Unicode code point at position 3 IN THE STRING > > IS="+a.codePointAt(3)); > >   System.out.println("Unicode code point at position 6 IN THE STRING > > IS="+a.codePointAt(6)); > >  } > > } > > > Output > > java CodePointAtstring > > GIVEN STRING IS=³?Welcome to Rose india > > Unicode code point at position 0 IN THE STRING IS=252 > > Unicode code point at position 1 IN THE STRING IS=13527 > > Unicode code point at position 2 IN THE STRING IS=87 > > Unicode code point at position 3 IN THE STRING IS=101 > > Unicode code point at position 6 IN THE STRING IS=111 > > That seems completely reasonable to me because 252 = 0x00fc and 13527 = > 0x34d7. > > Nothing in your program has anything to do with UTF-8 encoding. > > -- > RGB- é±è被å¼ç¨æå - > > - 顯示被å¼ç¨æå -- é±è被å¼ç¨æå - > > - 顯示被å¼ç¨æå -
From: Lew on 27 Jan 2010 20:49 Please, do not top-post. moonhkt wrote: > I want output the Character in the string one by one. > Now,codePointAt just print the Code points value. 'codePointAt()' doesn't print anything. How are you actually printing it? 'codePointAt()' returns an int, not a character. <http://java.sun.com/javase/6/docs/api/java/lang/String.html#codePointAt(int)> Most methods that output an int show the int value, not the equivalent character. If you want to display an int as a character, you have to use a method that will do that. I don't know offhand of a method in the standard API that does that, but perusal of the Javadocs might reveal one, otherwise you'll have to code one yourself or find a third-party library that already has such. -- Lew
From: Roedy Green on 28 Jan 2010 01:11 On Wed, 27 Jan 2010 16:12:18 +0000, RedGrittyBrick <RedGrittyBrick(a)spamweary.invalid> wrote, quoted or indirectly quoted someone who said : > >What do you mean by "UTF-8 char code"? Strictly speaking there is no >such thing. You might mean "Unicode code-point" or "sequence of octets >in UTF8-encoding" The point of an encoding is it hides the details of how 16-chars are inserted into an 8-bit stream. All you are interested in the 16-bit Java char value or perhaps the java codepoint value if you have 32-bit chars embedded as well. -- Roedy Green Canadian Mind Products http://mindprod.com Computers are useless. They can only give you answers. ~ Pablo Picasso (born: 1881-10-25 died: 1973-04-08 at age: 91)
From: moonhkt on 28 Jan 2010 09:35 Yes. This is my want. But my output is not same with you. You are correct. Run in Jcreator 4.5 version --------------------Configuration: <Default>-------------------- GIVEN STRING IS=ç¾¹?î¢elcome to Rose India ??. Length of string is 27 CodePoints in string is 26 Character[0] is ç¾¹ Character[1] is ?? Character[2] is W Character[3] is e Character[4] is l Character[5] is c Character[6] is o Character[7] is m Character[8] is e Character[9] is Character[10] is t Character[11] is o Character[12] is Character[13] is R Character[14] is o Character[15] is s Character[16] is e Character[17] is Character[18] is I Character[19] is n Character[20] is d Character[21] is i Character[22] is a Character[23] is Character[24] is ? Character[25] is ? Character[26] is . Process completed. On Jan 28, 6:38 pm, RedGrittyBrick <RedGrittyBr...(a)spamweary.invalid> wrote:> moonhkt wrote: > > RedGrittyBrick wrote: > >> moonhkt wrote: > >>> Lothar Kimmeringer wrote: > >>>> moonhkt wrote: > > >>>>> Below not work. > > >>>> [...] > >>>> Because it doesn't compile. What exactly doesn't work. Do you > >>>> get a wrong output, do you get an exception (you ignore in the > >>>> source you provided). A bit more information would really help > >>>> to be able to answer more than "something will be wrong in your > >>>> code". Regards, > > >>> Thank. I get below Example. But I can not get the UTF-8 char > >>> code. > > >> What do you mean by "UTF-8 char code"? Strictly speaking there is > >> no such thing. You might mean "Unicode code-point" or "sequence of > >> octets in UTF8-encoding" > > >> [...] > > >> Nothing in your program has anything to do with UTF-8 encoding. > > > Hi All I want output the Character in the string one by one. > > Now,codePointAt just print the Code points value. > > Why not use String's length() and CharAt() methods? > > I assume you can disregard characters outside Unicode's Base > Multilingual Plane (BMP) - if not, I think you'll have to check for > surrogate pairs. Characters outside the BMP are too big for a char. > > -------------------------------------8<----------------------------------- > public class UnicodeChars { >   public static void main(String[] args) >     throws UnsupportedEncodingException { > >    // I want console output in UTF-8 >    PrintStream sysout = new PrintStream(System.out, true, "UTF-8"); > >    // \u00fc is LATIN SMALL LETTER U WITH DIAERESIS; >    // \u34d7 is a character in CJK Unified Ideographs Extension A. >    // \uD834\uDD1E" are the surrogate pair for character U+1D11E. >    // U+1D11E is MUSICAL SYMBOL G CLEF; >    String a = "\u00fc\u34d7Welcome to Rose India \uD834\uDD1E."; > >    int n = a.length(); >    sysout.println("GIVEN STRING IS=" + a); >    sysout.printf("Length of string is %d%n", n); >    sysout.printf("CodePoints in string is %d%n", >      a.codePointCount(0,n)); >    for (int i = 0; i < n; i++) { >     sysout.printf("Character[%d] is %s%n", i, a.charAt(i)); >    } >   }} > > -------------------------------------8<----------------------------------- > GIVEN STRING IS=üãWelcome to Rose India ð. > Length of string is 27 > CodePoints in string is 26 > Character[0] is ü > Character[1] is ã > Character[2] is W > Character[3] is e > Character[4] is l > Character[5] is c > Character[6] is o > Character[7] is m > Character[8] is e > Character[9] is > Character[10] is t > Character[11] is o > Character[12] is > Character[13] is R > Character[14] is o > Character[15] is s > Character[16] is e > Character[17] is > Character[18] is I > Character[19] is n > Character[20] is d > Character[21] is i > Character[22] is a > Character[23] is > Character[24] is ? > Character[25] is ? > Character[26] is . > > -- > RGB
From: Lew on 28 Jan 2010 14:25
On Jan 28, 12:57 pm, RedGrittyBrick <RedGrittyBr...(a)spamweary.invalid> wrote: > PLEASE DON'T TOP-POST, PLEASE PUT YOUR REPLY AT THE BOTTOM, BELOW ANY > QUOTED TEXT. THANKS! > Actually, it's better to post inline, with comments interspersed with quoted material. -- Lew |