From: Lothar Kimmeringer on 23 Feb 2010 14:45 Amith wrote: > My problem is the UTF-8 string which i read from the URL is considered > as unicode.. i need it as UTF-8 > > i want it to be printed as "ನಮ್ಸ್ಕರಗುರು" and not as "\u0CA8\u0CAE\u0CCD > \u0CB8\u0CCD\u0C95\u0CB0\u0C97\u0CC1\u0CB0\u0CC1" What is this line for: fullString = fullString + new String(inputLine.getBytes(),"UTF-8") First of all, use StringBuilder and not String concatenation, second, why do you create a byte-array from a string, to create a new one again just to add it to an existing one. Just do fullString += inputLine should be enough (and solve your problem by the way). As said above use a StringBuilder instead as next step. Regards, Lothar -- Lothar Kimmeringer E-Mail: spamfang(a)kimmeringer.de PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81) Always remember: The answer is forty-two, there can only be wrong questions!
From: Amith on 23 Feb 2010 14:58 even if it is fullString = fullString + inputLine; it doesnt work, i have tried it, some more useless experiments led me to the this fullString = fullString + new String(inputLine.getBytes(),"UTF-8")
From: Lew on 23 Feb 2010 15:12 Amith wrote: > My problem is the UTF-8 string which i [sic] read from the URL is considered > as unicode.. i [sic] need it as UTF-8 UTF-8 *is* Unicode! > i [sic] want it to be printed as "ನಮ್ಸ್ಕರಗುರು" and not as "\u0CA8\u0CAE\u0CCD > \u0CB8\u0CCD\u0C95\u0CB0\u0C97\u0CC1\u0CB0\u0CC1" > public class URLReader { > public static void main(String[] args) throws Exception { > URL url = new URL("http://www.google.com/transliterate/indic? > tlqt=1&langpair=en|kn&text=namskara%20guru&&tl_app=1"); > BufferedReader in = new BufferedReader( > new InputStreamReader( > url.openStream(), "UTF8")); > > String inputLine = ""; No need to initialize 'inputLine' to a value you are just going to throw away. > String fullString = ""; > > > while ((inputLine = in.readLine()) != null) > fullString = fullString + new String(inputLine.getBytes(),"UTF-8"); This is silly. Just do what Lothar said and add the String to the String. I'm also pretty sure this isn't correct anyway because the way you defined the BufferedReader will have already converted the bytes from UTF-8 on the way in to 'inputLine', so that the 'getBytes()' will create bytes representing UTF-16 encoding. Reconverting those bytes to String using UTF-8 seems like it would not work. In any event, using straightforward String concatenation, or as Lothar suggested, StringBuilder concatenation, should keep encoding issues out of the way. Strings in Java internally will always be UTF-16. > String string = fullString.substring(fullString.indexOf("[\"") + 2, > fullString.indexOf("\",]")); > System.out.println(string); This will display the String using the platform's default encoding. > in.close(); This should be in a 'finally' block tightly associated with the input loop. > } > } Do not use TAB characters for indentation of Usenet posts. Use spaces, up to four per indent level. To get help you might want to keep the code readable. -- Lew
From: Lothar Kimmeringer on 23 Feb 2010 16:00 Amith wrote: > even if it is fullString = fullString + inputLine; Then it's quite likely that the stream you open is not delivering bytes of UTF-8 encoded data Regards, Lothar -- Lothar Kimmeringer E-Mail: spamfang(a)kimmeringer.de PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81) Always remember: The answer is forty-two, there can only be wrong questions!
From: markspace on 23 Feb 2010 16:16 Lothar Kimmeringer wrote: > Amith wrote: > >> even if it is fullString = fullString + inputLine; > > Then it's quite likely that the stream you open is not > delivering bytes of UTF-8 encoded data or the stream actually contains the string "\u0CA8\u0CAE\u0CCD" etc. I.e., it's UTF-8 with something else encoded on top of that. Or the problem is he doesn't have the right glyphs installed on his system, so he can't see the Arabic characters. All of which sum up to "it's not in the code you've shown us."
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: Nightly build or daily build? Next: Error message I can't figure out |