Prev: Send an email
Next: Weird mouse behavior
From: Phil Hunt on 12 Feb 2010 13:20 What is the best way to determine if a string contains "non Eglish" character ? TIA
From: Jeff Johnson on 12 Feb 2010 13:30 "Phil Hunt" <aaa(a)aaa.com> wrote in message news:unthJABrKHA.1796(a)TK2MSFTNGP02.phx.gbl... > What is the best way to determine if a string contains "non Eglish" > character ? That's not an easy question to answer. Consider the word "resum�." It's an English word (taken from French) but it contains an accented character that is not "native" to English. If your code encountered that word, would you want it to judge that it contains a "non-English character"?
From: Phil Hunt on 12 Feb 2010 13:45 Ok. Forget French for a moment. How can i tell if the string contain "Eastern Asia" character ? "Jeff Johnson" <i.get(a)enough.spam> wrote in message news:eO%236mFBrKHA.5940(a)TK2MSFTNGP02.phx.gbl... > "Phil Hunt" <aaa(a)aaa.com> wrote in message > news:unthJABrKHA.1796(a)TK2MSFTNGP02.phx.gbl... > >> What is the best way to determine if a string contains "non Eglish" >> character ? > > That's not an easy question to answer. Consider the word "resum?" It's an > English word (taken from French) but it contains an accented character > that is not "native" to English. If your code encountered that word, would > you want it to judge that it contains a "non-English character"? >
From: Helmut Meukel on 12 Feb 2010 14:33 "Phil Hunt" <aaa(a)aaa.com> schrieb im Newsbeitrag news:OthUFOBrKHA.4220(a)TK2MSFTNGP05.phx.gbl... > Ok. Forget French for a moment. How can i tell if the string contain "Eastern > Asia" character ? > > > "Jeff Johnson" <i.get(a)enough.spam> wrote in message > news:eO%236mFBrKHA.5940(a)TK2MSFTNGP02.phx.gbl... >> "Phil Hunt" <aaa(a)aaa.com> wrote in message >> news:unthJABrKHA.1796(a)TK2MSFTNGP02.phx.gbl... >> >>> What is the best way to determine if a string contains "non Eglish" >>> character ? >> >> That's not an easy question to answer. Consider the word "resum?" It's an >> English word (taken from French) but it contains an accented character that >> is not "native" to English. If your code encountered that word, would you >> want it to judge that it contains a "non-English character"? >> > Let's start with the code table. Characters in strings are just byte or integer values. In old Dos ASCII (IIRC: American Standard Code for Information Interchange) was used, 7 data bits + 1 parity bit. IBM created Extended ASCII (8 data bits, no parity bit) and used the doubled capacity to code some european characters and grafic characters (card symbols, lines...). This exteded ASCII became finally Code Page 437. Other code pages like 850 (multilingual), 865 (scandinavian) used the same code values for different characters. My first Vectra PC used the Roman8 character set, also used by HP's 250, 1000 and 3000 Systems. With Windows Microsoft switched to ANSI, still 8 bit and finally to Unicode (16 bit). So first you have to know how your text is coded, to determine which codes are used for eastern asian characters. HTH. Helmut.
From: Phil Hunt on 12 Feb 2010 15:24
Thanks. I basically have to examine the bit patterns to determine. I understand the ASCII, it is the Unicode I have some trouble with. I know it is 16 bits insteads of 8. But in VB/debug window, I have never been able to see a 16 bits character, maybe it does not display on the screen. Do you know what i am talking ? For the character 'A', how can I see the full 16 bits pattern in VB ? "Helmut Meukel" <NoSpam(a)NoProvider.de> wrote in message news:uW6t4oBrKHA.4636(a)TK2MSFTNGP06.phx.gbl... > > "Phil Hunt" <aaa(a)aaa.com> schrieb im Newsbeitrag > news:OthUFOBrKHA.4220(a)TK2MSFTNGP05.phx.gbl... >> Ok. Forget French for a moment. How can i tell if the string contain >> "Eastern Asia" character ? >> >> >> "Jeff Johnson" <i.get(a)enough.spam> wrote in message >> news:eO%236mFBrKHA.5940(a)TK2MSFTNGP02.phx.gbl... >>> "Phil Hunt" <aaa(a)aaa.com> wrote in message >>> news:unthJABrKHA.1796(a)TK2MSFTNGP02.phx.gbl... >>> >>>> What is the best way to determine if a string contains "non Eglish" >>>> character ? >>> >>> That's not an easy question to answer. Consider the word "resum?" It's >>> an English word (taken from French) but it contains an accented >>> character that is not "native" to English. If your code encountered that >>> word, would you want it to judge that it contains a "non-English >>> character"? >>> >> > > Let's start with the code table. > Characters in strings are just byte or integer values. > In old Dos ASCII (IIRC: American Standard Code for Information > Interchange) was used, 7 data bits + 1 parity bit. > IBM created Extended ASCII (8 data bits, no parity bit) and used > the doubled capacity to code some european characters and grafic > characters (card symbols, lines...). > This exteded ASCII became finally Code Page 437. Other code > pages like 850 (multilingual), 865 (scandinavian) used the same > code values for different characters. My first Vectra PC used the > Roman8 character set, also used by HP's 250, 1000 and 3000 > Systems. > With Windows Microsoft switched to ANSI, still 8 bit and > finally to Unicode (16 bit). > > So first you have to know how your text is coded, to determine > which codes are used for eastern asian characters. > > HTH. > > Helmut. |