Prev: Unicode Labels
Next: EM_CHARFROMPOS in RichEdit
From: Karl E. Peterson on 29 Jan 2009 20:46 expvb wrote: >> 1) Is a simple 0-255 check sufficient recognition for "Unicode" >> characters, where the input variable 'this' is a single character from a >> longer string? > > I am by no means an expert in this area, but here is what I know. Char codes > from 128 to 255 have special meaning, depending on whether you are using > ANSI or Unicode. In Unicode, their meaning is fixed regardless of Code Page. > They refer to Latin-1 Supplement, so for instance char code 169 is always > the copyright symbol. > > In ANSI, char codes from 128 to 255 have no meaning without the associated > Code Page. So when someone says char code 169 is the copyright symbol, he or > she is wrong. They must also say what Code Page they are talking about. > > So you have to add a Unicode flag to treat char codes 128 to 255 properly. > See these articles: > > http://en.wikipedia.org/wiki/Basic_Multilingual_Plane > http://en.wikipedia.org/wiki/Unicode Ouch! Okay, I'll have a look at those. (I'm in the middle of some lengthy installs at the moment, and they don't like an open browser. <groan>) I wonder if it really matters from a SendKeys perspective, though? I mean, if you sent a "�" to Sendkeys, it'd just use whatever codepage was in effect, no? Shouldn't a drop-in SendKeys replacement do the same thing? (Semi-rhetorical, granted.) I guess I need to read those pages you cited. Thinking ahead, are you suggesting I possibly just slip the test down to do the unicode processing for anything >= 128? >> 2) If it's Unicode, that means we skip all the Shiftkey processing, and >> instead just do this, right? > > It seems so. I like that response. :-) > It seems that in order to get keyboard scan codes, keyboard > layout and support for specific code pages has to be loaded(In the Control > Panel perhaps), so it doesn't seem possible to get scan codes for every > possible Unicode char. You can however send a Unicode char by sending > WM_CHAR using the W version of SendMessage after checking that the window is > a Unicode window by calling IsWindowUnicode(). Not sure I'm following there? >> 3) What else I've overlooked? <g> > > Checking that the OS is Windows 2000 before using KEYEVENTF_UNICODE flag, > and that the minimum required OS is Windows 98 because that is when > SendInput() was introduced. Excellent catch(es)! Would be wise to add something like that, but I think it'd just fail silently (except on Win95) without it, right? Thanks much... Karl -- ..NET: It's About Trust! http://vfred.mvps.org
From: expvb on 29 Jan 2009 22:36 "Karl E. Peterson" <karl(a)mvps.org> wrote in message news:OSVUhyngJHA.1184(a)TK2MSFTNGP04.phx.gbl... > I wonder if it really matters from a SendKeys perspective, though? I > mean, if you sent a "�" to Sendkeys, it'd just use whatever codepage was > in effect, no? Shouldn't a drop-in SendKeys replacement do the same thing? > (Semi-rhetorical, granted.) > > I guess I need to read those pages you cited. Thinking ahead, are you > suggesting I possibly just slip the test down to do the unicode processing > for anything >= 128? You can always consider anything < 256 to be ANSI, which what VB's SendKeys does. If someone wants to send Unicode characters in the range 128 to 255, then they can change the source code by themselves. >> It seems that in order to get keyboard scan codes, keyboard >> layout and support for specific code pages has to be loaded(In the >> Control >> Panel perhaps), so it doesn't seem possible to get scan codes for every >> possible Unicode char. You can however send a Unicode char by sending >> WM_CHAR using the W version of SendMessage after checking that the window >> is >> a Unicode window by calling IsWindowUnicode(). > > Not sure I'm following there? I was saying that it's not always possible to get scan codes for every character. After checking the documentation, when you use KEYEVENTF_UNICODE, Windows sends VK_PACKET, which later translates to WM_CHAR and converted to ANSI if the window is ANSI, so there is nothing special you need to do. However, it seems that in certain situations the Unicode character is discarded. Use the Search tab in MSDN and type "VK_PACKET" (5 results), or "IME_PROP_ACCEPT_WIDE_VKEY" flag, which affect how VK_PACKET is processed. > Would be wise to add something like that, but I think it'd just fail > silently (except on Win95) without it, right? You can call VB's SendKeys in Windows 9x.
From: Thorsten Albers on 30 Jan 2009 07:51 Karl E. Peterson <karl(a)mvps.org> schrieb im Beitrag <OzYw1QmgJHA.2092(a)TK2MSFTNGP05.phx.gbl>... > 1) Is a simple 0-255 check sufficient recognition for "Unicode" characters, where > the input variable 'this' is a single character from a longer string? Yes, unless you want to deal with codepage pecularities since Unicode character codes 128 to 255 encode always the same characters while ANSI character code 128 to 255 are codepage dependent. > If code >= 0 And code < 256 Then 'ascii Easier and presumably faster: If Not CBool(code And &HFF00) Then > 3) What else I've overlooked? <g> Maybe you should check for some Unicode characters and prevent them from beeing sent, like the - High surrogates D800h to DBFFh - Low surrogates DC00h to DFFFh - Specials FFF0h to FFFFh It can't be excluded that sending one of these to a process may cause trouble... -- Thorsten Albers albers (a) uni-freiburg.de
From: Thorsten Albers on 30 Jan 2009 07:56 Karl E. Peterson <karl(a)mvps.org> schrieb im Beitrag <OSVUhyngJHA.1184(a)TK2MSFTNGP04.phx.gbl>... > I wonder if it really matters from a SendKeys perspective, though? I mean, if you > sent a "�" to Sendkeys, it'd just use whatever codepage was in effect, no? > Shouldn't a drop-in SendKeys replacement do the same thing? (Semi-rhetorical, > granted.) I wouldn't bother about codepage issues: You are providing an ANSI and a Unicode version of your procedure. The ANSI procedures always sends ANSI characters, the Unicode procedure always sends Unicode characters (even for 0-255). It's up to the developer which of the two procedures he has to call. -- Thorsten Albers albers (a) uni-freiburg.de
From: mark.tunnard.jackson on 30 Jan 2009 09:12
I agree with Thorsten. And I don't think checking AscW<255 is going to work anyway. For instance, on code page 1252 (English & Western Europe), ANSI character 0x83 is Unicode character 0x0192 ("#LATIN SMALL LETTER F WITH HOOK") ? ascw(chr$(&H83&)), asc(chr$(&H83&)) 402 131 Unicode character 0x0083 is a non-breaking hyphen. The point is that "ANSI" character code 128-255 will map to a different set of Unicode characters depending on the code page. I'm not sure how best to test whether a character is supported on the current code page. Maybe convert Unicode->"ANSI"->Unicode and see if the string is unchanged? |