Prev: sql sort problem ?
Next: Very Quick Question
From: RedGrittyBrick on 26 May 2010 11:15 On 26/05/2010 15:55, RedGrittyBrick wrote: > On 26/05/2010 15:12, moonhkt wrote: >> On May 26, 4:56 pm, RedGrittyBrick<RedGrittyBr...(a)spamweary.invalid> >> wrote: >>> On 25/05/2010 15:18, moonhkt wrote: >>> >>>> Our ISO8859-1 Database(Progress Database) have some Japanese/Korea/ >>>> Simplified Chinese and Traditional Chinese. Those Language imported >>>> by lookup function. e.g. When User Input "G" in particular , the >>>> lookup program will get "Green" in corresponding Language Character >>>> set. Also, I checked other GB2312 Database(Progress Database), the >>>> Encoding Value of "测试" (in English "TEST") same as IS08859-1. Checked >>>> by unix tool "od -ct x1 file_name". >>> >>>> For BIG5 conversion, I just for testing how to change GB2312 to >>>> BIG5. My Boss ask me for check what is the encoding value for "TEST" >>>> in GB2312 or BIG5. So, I want convert to BIG5 to check what encoding >>>> value in BIG5. >>> >>> "测试" is simplified Chinese. >>> "測試" is traditional Chinese. >>> >>> So far as I know: >>> GB2312 is simplified Chinese. >>> Big5 is traditional Chinese. >>> >>> Therefore: >>> You cannot write "测试" in Big5 >>> You cannot write "測試" in GB2312 >>> >>> Unless I am mistaken. >>> >>> One simplified Chinese character may correspond to several traditional >>> Chinese characters. Java cannot translate "测试" to "測試" because that >>> is a process that requires artistic skill, literary skill and an >>> understanding of the context. >>> >>> I do not read, write, speak nor understand Chinese so I only offer the >>> above as my somewhat uninformed understanding of the situation. >> >> >> "测试" in GB2312 and "測試" in BIG5. > > Yes. Different characters. Not the same. > >> >> My testing is Change GB2312 to UTF-8 (OK). > > Yes. Because Unicode includes all characters that are in GB2312. > >> Then UTF-8 to BIG5, This change not OK. > > No, because Big5 is a lot smaller than Unicode and does not include 测 > or 试 characters* > >> Is some missing or other reason ? > > Yes, 测 and 试 characters are missing from Big5* > >> >> One simplified Chinese character may correspond to several traditional >> Chinese characters. It may not true. > > It is true for some characters. For example: > 台 = 臺 or 台 or 檯 or 枱 or 颱 > > There is a list at > <http://en.wikipedia.org/wiki/Multiple_association_of_converting_Simplified_Chinese_to_Traditional_Chinese> > > > I suspect Java, for this reason, does not attempt to translate a > simplified Chinese character to a traditional Chinese character. > > > > * I haven't checked because finding Chinese characters in enormous lists > is hard work for me. So I might be wrong :-) > See http://www.chinesetools.eu/tools/gb2big5/chinese-convert.js You can probably adapt this to Java. -- RGB |