Non English string ? [Visual Basic]

Prev: Send an email
Next: Weird mouse behavior

From: Phil Hunt on 12 Feb 2010 13:20

What is the best way to determine if a string contains "non Eglish"
character ?
TIA

From: Jeff Johnson on 12 Feb 2010 13:30

"Phil Hunt" <aaa(a)aaa.com> wrote in message
news:unthJABrKHA.1796(a)TK2MSFTNGP02.phx.gbl...

> What is the best way to determine if a string contains "non Eglish"
> character ?

That's not an easy question to answer. Consider the word "resum�." It's an
English word (taken from French) but it contains an accented character that
is not "native" to English. If your code encountered that word, would you
want it to judge that it contains a "non-English character"?

From: Phil Hunt on 12 Feb 2010 13:45

Ok. Forget French for a moment. How can i tell if the string contain
"Eastern Asia" character ?

"Jeff Johnson" <i.get(a)enough.spam> wrote in message
news:eO%236mFBrKHA.5940(a)TK2MSFTNGP02.phx.gbl...
> "Phil Hunt" <aaa(a)aaa.com> wrote in message
> news:unthJABrKHA.1796(a)TK2MSFTNGP02.phx.gbl...
>
>> What is the best way to determine if a string contains "non Eglish"
>> character ?
>
> That's not an easy question to answer. Consider the word "resum?" It's an
> English word (taken from French) but it contains an accented character
> that is not "native" to English. If your code encountered that word, would
> you want it to judge that it contains a "non-English character"?
>

From: Helmut Meukel on 12 Feb 2010 14:33

"Phil Hunt" <aaa(a)aaa.com> schrieb im Newsbeitrag
news:OthUFOBrKHA.4220(a)TK2MSFTNGP05.phx.gbl...
> Ok. Forget French for a moment. How can i tell if the string contain "Eastern
> Asia" character ?
>
>
> "Jeff Johnson" <i.get(a)enough.spam> wrote in message
> news:eO%236mFBrKHA.5940(a)TK2MSFTNGP02.phx.gbl...
>> "Phil Hunt" <aaa(a)aaa.com> wrote in message
>> news:unthJABrKHA.1796(a)TK2MSFTNGP02.phx.gbl...
>>
>>> What is the best way to determine if a string contains "non Eglish"
>>> character ?
>>
>> That's not an easy question to answer. Consider the word "resum?" It's an
>> English word (taken from French) but it contains an accented character that
>> is not "native" to English. If your code encountered that word, would you
>> want it to judge that it contains a "non-English character"?
>>
>

Let's start with the code table.
Characters in strings are just byte or integer values.
In old Dos ASCII (IIRC: American Standard Code for Information
Interchange) was used, 7 data bits + 1 parity bit.
IBM created Extended ASCII (8 data bits, no parity bit) and used
the doubled capacity to code some european characters and grafic
characters (card symbols, lines...).
This exteded ASCII became finally Code Page 437. Other code
pages like 850 (multilingual), 865 (scandinavian) used the same
code values for different characters. My first Vectra PC used the
Roman8 character set, also used by HP's 250, 1000 and 3000
Systems.
With Windows Microsoft switched to ANSI, still 8 bit and
finally to Unicode (16 bit).

So first you have to know how your text is coded, to determine
which codes are used for eastern asian characters.

HTH.

Helmut.

From: Phil Hunt on 12 Feb 2010 15:24

Thanks. I basically have to examine the bit patterns to determine.
I understand the ASCII, it is the Unicode I have some trouble with. I know
it is 16 bits insteads of 8. But in VB/debug window, I have never been able
to see a 16 bits character, maybe it does not display on the screen. Do you
know what i am talking ?
For the character 'A', how can I see the full 16 bits pattern in VB ?

"Helmut Meukel" <NoSpam(a)NoProvider.de> wrote in message
news:uW6t4oBrKHA.4636(a)TK2MSFTNGP06.phx.gbl...
>
> "Phil Hunt" <aaa(a)aaa.com> schrieb im Newsbeitrag
> news:OthUFOBrKHA.4220(a)TK2MSFTNGP05.phx.gbl...
>> Ok. Forget French for a moment. How can i tell if the string contain
>> "Eastern Asia" character ?
>>
>>
>> "Jeff Johnson" <i.get(a)enough.spam> wrote in message
>> news:eO%236mFBrKHA.5940(a)TK2MSFTNGP02.phx.gbl...
>>> "Phil Hunt" <aaa(a)aaa.com> wrote in message
>>> news:unthJABrKHA.1796(a)TK2MSFTNGP02.phx.gbl...
>>>
>>>> What is the best way to determine if a string contains "non Eglish"
>>>> character ?
>>>
>>> That's not an easy question to answer. Consider the word "resum?" It's
>>> an English word (taken from French) but it contains an accented
>>> character that is not "native" to English. If your code encountered that
>>> word, would you want it to judge that it contains a "non-English
>>> character"?
>>>
>>
>
> Let's start with the code table.
> Characters in strings are just byte or integer values.
> In old Dos ASCII (IIRC: American Standard Code for Information
> Interchange) was used, 7 data bits + 1 parity bit.
> IBM created Extended ASCII (8 data bits, no parity bit) and used
> the doubled capacity to code some european characters and grafic
> characters (card symbols, lines...).
> This exteded ASCII became finally Code Page 437. Other code
> pages like 850 (multilingual), 865 (scandinavian) used the same
> code values for different characters. My first Vectra PC used the
> Roman8 character set, also used by HP's 250, 1000 and 3000
> Systems.
> With Windows Microsoft switched to ANSI, still 8 bit and
> finally to Unicode (16 bit).
>
> So first you have to know how your text is coded, to determine
> which codes are used for eastern asian characters.
>
> HTH.
>
> Helmut.

| Next | Last
Pages: 1 2 3 4 5 6
Prev: Send an email
Next: Weird mouse behavior