From: RedGrittyBrick on
On 26/05/2010 15:55, RedGrittyBrick wrote:
> On 26/05/2010 15:12, moonhkt wrote:
>> On May 26, 4:56 pm, RedGrittyBrick<RedGrittyBr...(a)spamweary.invalid>
>> wrote:
>>> On 25/05/2010 15:18, moonhkt wrote:
>>>
>>>> Our ISO8859-1 Database(Progress Database) have some Japanese/Korea/
>>>> Simplified Chinese and Traditional Chinese. Those Language imported
>>>> by lookup function. e.g. When User Input "G" in particular , the
>>>> lookup program will get "Green" in corresponding Language Character
>>>> set. Also, I checked other GB2312 Database(Progress Database), the
>>>> Encoding Value of "测试" (in English "TEST") same as IS08859-1. Checked
>>>> by unix tool "od -ct x1 file_name".
>>>
>>>> For BIG5 conversion, I just for testing how to change GB2312 to
>>>> BIG5. My Boss ask me for check what is the encoding value for "TEST"
>>>> in GB2312 or BIG5. So, I want convert to BIG5 to check what encoding
>>>> value in BIG5.
>>>
>>> "测试" is simplified Chinese.
>>> "測試" is traditional Chinese.
>>>
>>> So far as I know:
>>> GB2312 is simplified Chinese.
>>> Big5 is traditional Chinese.
>>>
>>> Therefore:
>>> You cannot write "测试" in Big5
>>> You cannot write "測試" in GB2312
>>>
>>> Unless I am mistaken.
>>>
>>> One simplified Chinese character may correspond to several traditional
>>> Chinese characters. Java cannot translate "测试" to "測試" because that
>>> is a process that requires artistic skill, literary skill and an
>>> understanding of the context.
>>>
>>> I do not read, write, speak nor understand Chinese so I only offer the
>>> above as my somewhat uninformed understanding of the situation.
>>
>>
>> "测试" in GB2312 and "測試" in BIG5.
>
> Yes. Different characters. Not the same.
>
>>
>> My testing is Change GB2312 to UTF-8 (OK).
>
> Yes. Because Unicode includes all characters that are in GB2312.
>
>> Then UTF-8 to BIG5, This change not OK.
>
> No, because Big5 is a lot smaller than Unicode and does not include 测
> or 试 characters*
>
>> Is some missing or other reason ?
>
> Yes, 测 and 试 characters are missing from Big5*
>
>>
>> One simplified Chinese character may correspond to several traditional
>> Chinese characters. It may not true.
>
> It is true for some characters. For example:
> 台 = 臺 or 台 or 檯 or 枱 or 颱
>
> There is a list at
> <http://en.wikipedia.org/wiki/Multiple_association_of_converting_Simplified_Chinese_to_Traditional_Chinese>
>
>
> I suspect Java, for this reason, does not attempt to translate a
> simplified Chinese character to a traditional Chinese character.
>
>
>
> * I haven't checked because finding Chinese characters in enormous lists
> is hard work for me. So I might be wrong :-)
>


See http://www.chinesetools.eu/tools/gb2big5/chinese-convert.js
You can probably adapt this to Java.

--
RGB
First  |  Prev  | 
Pages: 1 2
Prev: sql sort problem ?
Next: Very Quick Question