From: Joseph M. Newcomer on 7 Aug 2006 01:16 CharNext, CharPrev, and that set of "APIs" that deal with multibyte representations think of 'characters' as multibyte sequences. This class of calls are the only exceptions I know to the rule that 'character' == TCHAR. joe On Fri, 04 Aug 2006 00:46:08 -0700, "Mihai N." <nmihai_year_2000(a)yahoo.com> wrote: >> So what? Where we intuitively think that the stated limit of MAX_PATH >> characters means MAX_PATH chars in ANSI, Microsoft informed me that the >> limit really is MAX_PATH characters even if it takes twice that many bytes. >This means our intuition is wrong :-) >It is an internal limitation, so we should think how is Windows working >internaly. And that is Unicode. >I bet in Windows 9x the limit is MAX_PATH char (the 1 byte programming char, >not the user "character") > >> You asked for examples of cases where we had been wrong in nearly always >> assuming that MSDN's statements about characters meant TCHARs, and this is >> a big example. >True, the example is good, the the doc is not clear. > > >> You suspect that Microsoft's e-mail to me was accurate, and as mentioned, I >> have the same impression. Though they send a lot of unbelievable e-mails, >> they send some believable e-mails too and this was one. >Yes, I think the email is accurate, and you are right, the doc is not clear. >Just noting that here the limit is "in the belly", so it might be a bit >different than the something you pass as a parameter. >For instance the internal implementation of some ANSI API might be: > >int BlaBlaA( char * wideBuff, int nBufLen ) { // here nBufLen is char count > WCHAR myWideBuffer = new WCHAR [nBufLen]; > MultiByteToWideChar( GetACP(), flags,buffer, nBufLen, wideBuff, BufLen ); > int nRez = BlaBlaW( wideBuff, nBufLen ); // here nBufLen is WCHAR count > delete [] wideBuff; > return nRez; >} > >Ok, I guess the whole thing has some error checking and does some king of >memory reuse, not new/delete for each API :-) but this is the idea. >So for APIs that take the length as param the limit tends to really be in >char in the ANSI API. > > >> Yup. By the way, considering that VFAT can store a filename consisting of >> around 250 Kanji, one weekend experiment would be to try opening the file >> under Windows 98 (Japanese version of course). >I am quite sure the limit is in chars there. > >> But really I'll consider it >> close enough if it works under Windows 2000, XP, 2003, and Vista beta. I >> haven't had time to test it and I do believe that mail. >I also believe the email :-) > > >Ok, this is getting fuzzy. >So, in the end, I am not arguing with you. >My initial affirmation ("fact very few APIs") means I know there are some >APIs, just that I could not think of one on the top of my head. >And I have asked you for examples to learn something. >And yes, you are also right that for the example the doc is unclear. Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Mihai N. on 7 Aug 2006 02:54 > CharNext, CharPrev, and that set of "APIs" that deal with multibyte > representations think of 'characters' as multibyte sequences. > This class of calls are the only exceptions I > know to the rule that 'character' == TCHAR. And the ANSI versions of these 2 APIs are broken in XP. Oups! :-) -- Mihai Nita [Microsoft MVP, Windows - SDK] http://www.mihai-nita.net ------------------------------------------ Replace _year_ with _ to get the real email
From: Mihai N. on 7 Aug 2006 03:00 > CharNext, CharPrev, and that set of "APIs" that deal with multibyte > representations think > of 'characters' as multibyte sequences. This class of calls are the only > exceptions I know to the rule that 'character' == TCHAR. A bunch of CRT api that depend on _MBCS being defined: _mbsinc (the CRT brother of CharNext), _mbslen, _mbsnbcnt, _mbsnccnt And (of course), the MSDN doc is not always clear if we are talking "programmer characters" (char) or "user characters" (sometime meaning two bytes) -- Mihai Nita [Microsoft MVP, Windows - SDK] http://www.mihai-nita.net ------------------------------------------ Replace _year_ with _ to get the real email
From: Mihai N. on 7 Aug 2006 03:13 > But the limit is MAX_PATH characters. THat's what we've been discussing. > In Unicode mode, the limit is MAX_PATH characters, which would occupy > 2*MAX_PATH bytes. That is, MAX_PATH TCHARs, and therefore their comment > is completely CONSISTENT with the fact that a 'character' is a 'TCHAR'. > Since you can't use any multibyte encoding in CreateFile, I > don't see where there is any problem here. 'character', in nearly every > context we've discussed, means 'TCHAR'. Nope. Norman has a point here. Let's say I have an ANSI application. And it calls CreateFileA. If I pass MAX_PATH low-ascii characters (let's say "aaaa...aaa"), all is nice and dandy. If I pass MAX_PATH Kanji characters, they get converted to Unicode, I get MAX_PATH Unicode code points, and all is well again. But MAX_PATH Kanji means 2 x MAX_PATH char in ANSI, meaning 2 x MAX_PATH TCHARs. So in this case it the MAX_PATH characters limit mean really "user characters", not TCHARs. > Since you can't use any multibyte encoding in CreateFile, You can if you are on a MBCS system (ie Japanese), because you pass ANSI strings, which are MBCS. -- Mihai Nita [Microsoft MVP, Windows - SDK] http://www.mihai-nita.net ------------------------------------------ Replace _year_ with _ to get the real email
From: Mihai N. on 7 Aug 2006 03:20
> All A-suffix APIs use Unicode internally. The entire kernel is written in > terms of Unicode, so all A-suffix APIs first convert the ANSI text to > Unicode and then call the actual internal implementation of the API. I know :-) http://www.mihai-nita.net/20050306b.shtml > This means that if you pass in a UTF8 string, > it isn't seen as UTF8, it's seen as 8-bit ANSI bytes, and will be converted > to 16-bit bytes as if it were a sequence of 8-bit characters, which leads > to the comment that "UTF-8 is not supported". So it's true :-) > It *is* supported, but not at the kernel API interface level. Which means that the level of support for UTF-8 is lower than the level of support for Shift-JIS (which is ANSI cp for Japanese). This is unlinke the Unix/Linux world, where if I set the locale to ja_JP.euc_JP or ja_JP.UTF-8, everything works the same, all APIs are ok. I can do strupr, fopen a file name with Japanese name, and so on. The two charsets are equally supported. We can say "in Win there is partial support for UTF-8" and call it a day :-) We both understand what it means :-) -- Mihai Nita [Microsoft MVP, Windows - SDK] http://www.mihai-nita.net ------------------------------------------ Replace _year_ with _ to get the real email |