From: Muhammed on 21 Dec 2009 06:53 Hi All. My code should support UTF-8 chars (all languages like chinese, arabic) i have used char's in code..Is it ok?? Or do i need to use wide chars..(is it avaliable in Unix platform?)
From: Mikko Rauhala on 21 Dec 2009 09:44 On Mon, 21 Dec 2009 03:53:36 -0800 (PST), Muhammed <doublemaster007(a)gmail.com> wrote: > My code should support UTF-8 chars (all languages like chinese, > arabic) i have used char's in code..Is it ok?? Or do i need to use > wide chars..(is it avaliable in Unix platform?) Wide characters (wchar_t) are generally available on (modern) Unix platforms (often using the UTF-32 representation internally in contrast to UTF-16 on Windows). However, it's not required to use wide characters for Unicode support; in fact, due to the arguable cumbersomeness of C wide character support, many people prefer to do exactly what you seem to be doing: storing the strings as UTF-8 inside plain old C strings (char arrays). This is done for example by the popular GTK+ toolkit (used by eg. Gnome) and its rendering library, Pango. You might find some useful Unicode/UTF-8 utility functions in GLib[1]. The downside is of course that if you want to mix and match wchar_t and utf-8 char array using code, you'll have to do conversions where appropriate. GLib should be of help for that, too: On systems using GNU iconv, you can use "WCHAR_T" as a source or target codeset for g_convert() and friends. However, stock iconv doesn't necessarily support the WCHAR_T (or any other necessary) type on all systems, so you might have to do a bit of architecture-spesific code that knows what the local wchar_t type actually is, and if it's either UTF-16 or UTF-32 (~UCS-4), use some of the g_utf8_to_ucs4() and similar functions from GLib. But of course, this only if you need to use mixed type Unicode handling code. (Also, if your wchar_t is non-Unicode, you're probably better off not touching that. You can check if the __STDC_ISO_10646__ macro is defined; if it is, your wchar_t does use ISO-10646 characters, which for code point mapping purposes is the same thing as Unicode.) Hope this helps and all. [1] http://library.gnome.org/devel/glib/2.22/glib-Unicode-Manipulation.html http://library.gnome.org/devel/glib/2.22/glib-Character-Set-Conversion.html -- Mikko Rauhala <mjr(a)iki.fi> - http://www.iki.fi/mjr/blog/ The Finnish Pirate Party - http://piraattipuolue.fi/ World Transhumanist Association - http://transhumanism.org/ Singularity Institute - http://singinst.org/
From: Muhammed on 22 Dec 2009 00:51 On Dec 21, 7:44 pm, Mikko Rauhala <m...(a)iki.fi> wrote: > On Mon, 21 Dec 2009 03:53:36 -0800 (PST), Muhammed <doublemaster...(a)gmail..com> > > wrote: > > My code should support UTF-8 chars (all languages like chinese, > > arabic) i have used char's in code..Is it ok?? Or do i need to use > > wide chars..(is it avaliable in Unix platform?) > > Wide characters (wchar_t) are generally available on (modern) > Unix platforms (often using the UTF-32 representation internally > in contrast to UTF-16 on Windows). > > However, it's not required to use wide characters for Unicode > support; in fact, due to the arguable cumbersomeness of C wide > character support, many people prefer to do exactly what you > seem to be doing: storing the strings as UTF-8 inside plain > old C strings (char arrays). > > This is done for example by the popular GTK+ toolkit (used by > eg. Gnome) and its rendering library, Pango. You might find > some useful Unicode/UTF-8 utility functions in GLib[1]. > > The downside is of course that if you want to mix and match > wchar_t and utf-8 char array using code, you'll have to do > conversions where appropriate. GLib should be of help for that, > too: > > On systems using GNU iconv, you can use "WCHAR_T" as a source > or target codeset for g_convert() and friends. However, stock > iconv doesn't necessarily support the WCHAR_T (or any other > necessary) type on all systems, so you might have to do a > bit of architecture-spesific code that knows what the local > wchar_t type actually is, and if it's either UTF-16 or > UTF-32 (~UCS-4), use some of the g_utf8_to_ucs4() and similar > functions from GLib. But of course, this only if you need > to use mixed type Unicode handling code. > > (Also, if your wchar_t is non-Unicode, you're probably better > off not touching that. You can check if the __STDC_ISO_10646__ > macro is defined; if it is, your wchar_t does use ISO-10646 > characters, which for code point mapping purposes is the > same thing as Unicode.) > > Hope this helps and all. > > [1]http://library.gnome.org/devel/glib/2.22/glib-Unicode-Manipulation.html > http://library.gnome.org/devel/glib/2.22/glib-Character-Set-Conversio... > > -- > Mikko Rauhala <m...(a)iki.fi> -http://www.iki.fi/mjr/blog/ > The Finnish Pirate Party -http://piraattipuolue.fi/ > World Transhumanist Association -http://transhumanism.org/ > Singularity Institute -http://singinst.org/ Thank you sooo much...it helped me some extent..i need to check more
|
Pages: 1 Prev: ANN: Seed7 Release 2009-12-20 Next: Happy Holidays and A Happy New Year |