From: Joseph M. Newcomer on 1 Aug 2006 10:53 The documentation for StringCchPrintf talks about counts of characters. In the ANSI compilation, each character occupies exactly one TCHAR. I'm not sure how you figure a character can occupy two TCHARs (which are just chars in ANSI) since each char has a value of exactly the range 0..255, which fits in exactly one char. The documentation for StringCchPrintf says ============================================ StringCchPrintf Function StringCchPrintf is a replacement for sprintf. It accepts a format string and a list of arguments and returns a formatted string. The size, in characters, of the destination buffer is provided to the function to ensure that StringCchPrintf does not write past the end of this buffer. Syntax HRESULT StringCchPrintf( LPTSTR pszDest, size_t cchDest, LPCTSTR pszFormat, ... ); Parameters pszDest [out] Pointer to a buffer which receives the formatted, null-terminated string created from pszFormat and its arguments. cchDest [in] Size of the destination buffer, in ****characters****. This value must be sufficiently large to accommodate the final formatted string plus 1 to account for the terminating null character. The maximum number of characters allowed is STRSAFE_MAX_CCH. pszFormat [in] Pointer to a buffer containing a printf-style format string. This string must be null-terminated. .... [in] Arguments to be inserted into pszFormat. ===================================== Note that the word ****characters**** is clearly in italics in the original documentation. Now where, in the above documentation, does it say that a 'character' is exactly one byte? How do you infer that a 'character', in ANSI mode, can occupy two bytes? Where is there the slightest confusion between the char and wchar_t data type here? I think you have a very serious confusion in understanding the difference between the terms 'character' (which is one or two bytes depending on the compilation mode), 'char' (which is always one byte), 'wchar_t' (which is always two bytes), and TCHAR (which is one or two bytes depending on the compilation mode). I have no idea what you mean by "one 2-TCHAR character". This is a contradiction. A character is by definition a 1-TCHAR character, because that is what is meant by "character". A TCHAR[2] holds two characters. A string is a sequence of zero or more characters followed by a NUL character. In ANSI mode, this means for a TCHAR[2] to represent a string, it holds a single 8-bit character and a single 8-bit NUL character, in Unicode this means it holds a single 2-byte Unicode character and 2-byte NUL character. How can you get a 2-byte "character" in ANSI mode? This contradicts the whole concept of "character" as specified for each mode. (Note that in ANSI mode, you can have UTF encoding that represents a single 8-bit character as two characters, but note that this is two characters, and in ANSI mode that is two bytes. But StringCchPrintf is not going to somehow magically convert anything to UTF-8 in the process of formatting it. Since the target formatting string, %c, formats exactly one character, a 2-character buffer, in any mode, will suffice, and StringCchPrintf will work. UTF-8 is a multibyte encoding and that is a discussion completely separate from the one we are having here). joe On Tue, 1 Aug 2006 10:27:37 +0900, "Norman Diamond" <ndiamond(a)community.nospam> wrote: >The documentation for StringCchPrintf talks about counts of characters. In >an ANSI compilation each character occupies one or two TCHARs depending on >the actual character. The documentation for StringCchPrintf doesn't say >that TCHARs are counted where it does say that characters are counted. > >Dr. Newcomer, you KNOW how, in an ANSI compilation, one 2-TCHAR character >will overflow a buffer which has enough space for only one 1-TCHAR >character. > >"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message >news:tppsc21810onsurc601ligkkiivh5pui77(a)4ax.com... >> The libraries are shared and there is already a copy of them loaded. >> >> What is wrong with StringCchPrintf? It won't overflow the buffer, which >> is a good thing. >> >> The char/wchar_t is what TCHAR means. But it is signed, which implies >> sign extension for >> any Unicode character > 7FFFU. This will not produce a good result in >> most cases. WORD >> will handle a char value because it won't sign extended. >> >> I made B an array of two characters. not two bytes. I distinctly recall >> writing >> TCHAR B[2]; >> which is two characters. This means in Unicode it is 4 bytes. >> >> StringCchPrintf will format the string, which is one character plus a >> terminal null >> character. Do not confuse "character" with "byte". StringCchPrintf will >> copy the single >> character and add a NULL character, which the last I looked, was two >> characters, the size >> of the array. >> joe >> >> On Mon, 31 Jul 2006 19:40:24 +0900, "Norman Diamond" >> <ndiamond(a)community.nospam> wrote: >> >>>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message >>>news:e57oc2lrr2nd1j0nt83h8e7h02ahjsbqih(a)4ax.com... >>> >>>> Use CString::Format as the preferred choice. >>> >>>On "real" Windows I agree. On Windows CE where extra libraries will >>>occupy >>>the machine's RAM, it might not be a good idea. >>> >>>> If you MUST use some form like _stprintf, use StringCchPrintf (I think >>>> that's the name, but search for strsafe.h on the MSDN) which at least >>>> will >>>> avoid any possibility of buffer overflow >>> >>>As documented it will not have such a beneficial effect. >>> >>>> StringCchPrintf(_T("%c"), B, sizeof(B) / sizeof(TCHAR), (BYTE)('a' + >>>> i)); >>> >>>Mihai N. addressed a problem with your cast to BYTE and you made an >>>adjustment which I'm still thinking about. Since arguments to >>>StringCchPrintf are either Unicode or ANSI, the last argument should be >>>either char or wchar_t, and I'm trying to figure out if WORD is guaranteed >>>to marshall a char value properly. >>> >>>More importantly is that, as documented, buffer overflow can very easily >>>occur. Suppose we have an ANSI compilation and make B an array of 2 >>>chars. >>>Then the buffer has enough space for 1 single-byte character plus a null >>>character. But if the last argument is a dou
From: Norman Diamond on 1 Aug 2006 20:04 "Mihai N." <nmihai_year_2000(a)yahoo.com> wrote in message news:Xns9811EA513836MihaiN(a)207.46.248.16... [Norman Diamond:] >> The documentation for StringCchPrintf talks about counts of characters. >> In an ANSI compilation each character occupies one or two TCHARs >> depending on the actual character. The documentation for StringCchPrintf >> doesn't say that TCHARs are counted where it does say that characters are >> counted. > > I suspect it is the typical MSDN confusion when talking about characters. > Since in strsafe.h I can find both StringCchPrintfA and StringCchPrintfW, > I assume it works like all the Win32 API with regard to buffer lengths. > Meaning that when they say "character" in ANSI context, one should really > understand char. Except that all the Win32 APIs don't actually work that way. SOME Win32 APIs count TCHARs, i.e. counting chars in ANSI and counting wchar_ts in Unicode. But SOME Win32 APIs really count characters. Microsoft has responded to a few cases, including one personally, to say that for some Win32 APIs, even in the ANSI versions, internal processing is performed in Unicode and the limits are counted in actual characters rather than in the number of bytes required for the ANSI representations. Therefore, when MSDN says that a function counts characters, it might be telling the truth. We cannot automatically assume otherwise.
From: Norman Diamond on 1 Aug 2006 20:12 I wrote: >> The documentation for StringCchPrintf talks about counts of characters. Dr. Newcomer's response emphasises several times that the documentation for StringCchPrintf talks about counts of ***** characters ***** EXACTLY as I said it does. It is reassuring to see this agreement, though I wonder why it's expressed so oddly. But then odd questions arises > Now where, in the above documentation, does it say that a 'character' is > exactly one byte? > How do you infer that a 'character', in ANSI mode, can occupy two bytes? Very very true. In the documentation of StringCchPrintf, MSDN correctly refrains from saying that a 'character' is exactly one byte. Microsoft is well aware that code page 932 (Shift-JIS) and the code page for the world's largest country by population and a couple of other code pages contain characters that, in ANSI mode, occupy two bytes. Dr. Newcomer, I think you are well aware of this too, and I am really confused why you ask these questions. Meanwhile, this is still the reason why, if MSDN's documentation is correct, buffer overflow can still occur. A caller of the ANSI version can have a buffer 2 bytes long, long enough for 1 single-byte character plus 1 single-byte null character, and say that its buffer length is 2. But StringCchPrintf, if it behaves as documented, will copy in 1 character no matter how many bytes it requires, plus 1 single-byte null character. If the first character occupies two bytes then the null character goes into the third byte of the two-byte buffer. I don't know where the discussion of UTF-8 came from but I'm not joining it, at least not for the moment. "Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message news:pbpuc2lq2o7e2ca2pi9opis83ubrilg4vp(a)4ax.com... > The documentation for StringCchPrintf talks about counts of characters. > In the ANSI > compilation, each character occupies exactly one TCHAR. I'm not sure how > you figure a > character can occupy two TCHARs (which are just chars in ANSI) since each > char has a value > of exactly the range 0..255, which fits in exactly one char. > > The documentation for StringCchPrintf says > ============================================ > StringCchPrintf Function > > StringCchPrintf is a replacement for sprintf. It accepts a format string > and a list of > arguments and returns a formatted string. The size, in characters, of the > destination > buffer is provided to the function to ensure that StringCchPrintf does not > write past the > end of this buffer. > > Syntax > > HRESULT StringCchPrintf( > LPTSTR pszDest, > size_t cchDest, > LPCTSTR pszFormat, > ... > ); > Parameters > > pszDest > [out] Pointer to a buffer which receives the formatted, null-terminated > string created > from pszFormat and its arguments. > cchDest > [in] Size of the destination buffer, in ****characters****. This value > must be > sufficiently large to accommodate the final formatted string > plus 1 to > account for the terminating null character. The maximum > number of > characters allowed is STRSAFE_MAX_CCH. > pszFormat > [in] Pointer to a buffer containing a printf-style format string. This > string must be > null-terminated. > ... > [in] Arguments to be inserted into pszFormat. > ===================================== > Note that the word ****characters**** is clearly in italics in the > original documentation. > Now where, in the above documentation, does it say that a 'character' is > exactly one byte? > How do you infer that a 'character', in ANSI mode, can occupy two bytes? > Where is there > the slightest confusion between the char and wchar_t data type here? I > think you have a > very serious confusion in understanding the difference between the terms > 'character' > (which is one or two bytes depending on the compilation mode), 'char' > (which is always one > byte), 'wchar_t' (which is always two bytes), and TCHAR (which is one or > two bytes > depending on the compilation mode). > > I have no idea what you mean by "one 2-TCHAR character". This is a > contradiction. A > character is by definition a 1-TCHAR character, because that is what is > meant by > "character". A TCHAR[2] holds two characters. A string is a sequence of > zero or more > characters followed by a NUL character. In ANSI mode, this means for a > TCHAR[2] to > represent a string, it holds a single 8-bit character and a single 8-bit > NUL character, in > Unicode this means it holds a single 2-byte Unicode character and 2-byte > NUL character. > How can you get a 2-byte "character" in ANSI mode? This contradicts the > whole concept of > "character" as specified for each mode. (Note that in ANSI mode, you can > have UTF > encoding that represents a single 8-bit character as two characters, but > note that this is > two characters, and in ANSI mode that is two bytes. But StringCchPrintf > is not going to > somehow magically convert anything to UTF-8 in the process of formatting > it. Since the > target formatting string, %c, formats exactly one character, a 2-character > buffer, in any > mode, will suffice, and StringCchPrintf will work. UTF-8 is a multibyte > encoding and that > is a discussion completely separate from the one we are having here). > joe > > > > On Tue, 1 Aug 2006 10:27:37 +0900, "Norman Diamond" > <ndiamond(a)community.nospam> wrote: > >>The documentation for StringCchPrintf talks about counts of characters. >>In >>an ANSI compilation each character occupies one or two TCHARs depending on >>the actual character. The documentation for StringCchPrintf doesn't say >>that TCHARs are counted where it does say that characters are counted. >> >>Dr. Newcomer, you KNOW how, in an ANSI compilation, one 2-TCHAR character >>will overflow a buffer which has enough space for only one 1-TCHAR >>character. >> >>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message >>news:tppsc21810onsurc601ligkkiivh5pui77(a)4ax.com... >>> The libraries are shared and there is already a copy of them loaded. >>> >>> What is wrong with StringCchPrintf? It won't overflow the buffer, which >>> is a good thing. >>> >>> The char/wchar_t is what TCHAR means. But it is signed, which implies >>> sign extension for >>> any Unicode character
From: Mihai N. on 2 Aug 2006 04:19 > Except that all the Win32 APIs don't actually work that way. SOME Win32 > APIs count TCHARs, i.e. counting chars in ANSI and counting wchar_ts in > Unicode. But SOME Win32 APIs really count characters. Microsoft has > responded to a few cases, including one personally, to say that for some > Win32 APIs, even in the ANSI versions, internal processing is performed in > Unicode and the limits are counted in actual characters rather than in the > number of bytes required for the ANSI representations. Can you give some examples? In my experience <<There are in fact very few APIs that deal with the "user character">> and those are decently documented. -- Mihai Nita [Microsoft MVP, Windows - SDK] http://www.mihai-nita.net ------------------------------------------ Replace _year_ with _ to get the real email
From: Joseph M. Newcomer on 2 Aug 2006 13:54
Multibyte Character Set is an *encoding* of a character set. In ANSI mode, MBCS can be used to encode 'characters' in an extended set; however, StringCchPrintf, sprintf, etc. do only convert characters using code pages in special cases, e.g., %lc or %C format. The formal definition for %c, the formatting code being discussed in this example, is that the int argument is converted to 'unsigned char' and formatted as a character. For ANSI mode, this means that 'character' is 'byte'. In ANSI mode, one character is one byte. In a multibyte character set, a glyph might be represented by one to four successive 8-bit bytes. Note that using %c would be erroneous for formatting an integer value, if the intent was to produce a multibyte sequence representing a single logical character. This can easily be seen by looking at the %c formatting code in output.c in the CRT source. %c formats exactly one byte in ANSI mode. So arguing that %c requires two bytes for a character is not correct. The exact code executed for %c formatting is unsigned short temp; temp = (unsigned short) get_int_arg(&argptr); { buffer.sz[0] = (char) temp; textlen = 1; } I see nothing here that can generate more than one byte of output. Note that the %C and %lc formats, which take wide character values and format them in accordance with the code page, *can* generate more than one byte of character, which does satisfy the objection raised. But the format here is clearly %c, and %c is clearly defined, and the implementation reflects that definition. So I'm not sure what the issue is here. StringCchPrintf is defined in terms of 8-bit characters and 16-bit characters, not in terms of logical characters encoded in an MBCS. MBCS does not enter the discussion; if you format using %lc or %C it will actually truncate the multibyte string to fit in the buffer. Thus, it obeys its requirement of not allowing a buffer overrun. This can be seen trivially simply by--get this--DOING THE EXPERIMENT!!!!! So while you can contend until the cows come home that you think that you know how to read the documentation, it is a matter of a couple minutes to actually do the experiment. I found that even when the wctomb function produces a sequence of multiple bytes to represent the wide character as a multibyte character, when formatting with %lc, the ANSI definition of StringCchPrintf is in terms of ANSI characters, 8-bit bytes, and it writes exactly one of the three bytes of the multibyte sequence, the first byte. So the sequence StringCchPrintf(buffer, '%lc', 0xF95C); will simply transfer to the target buffer the first 8-bit byte of what turned out to be a 3-byte multibyte sequence. Note that since I don't have appropriate multinational support, I had to actually set a breakpoint and "fake" the results of wctomb, because what it does on my machine is fail the conversion and return -1. So I simply placed two bytes and a NUL into the buffer as if wctomb had worked correctly, changed the length to 2, and proceeded with the execution. Otherwise, I just get an empty string. UTF-8 is one of the many multibyte character encodings that exist. I chose it as an example because it is specified in the Unicode standard. joe On Wed, 2 Aug 2006 09:12:11 +0900, "Norman Diamond" <ndiamond(a)community.nospam> wrote: >I wrote: >>> The documentation for StringCchPrintf talks about counts of characters. > >Dr. Newcomer's response emphasises several times that the documentation for >StringCchPrintf talks about counts of ***** characters ***** EXACTLY as I >said it does. It is reassuring to see this agreement, though I wonder why >it's expressed so oddly. > >But then odd questions arises > >> Now where, in the above documentation, does it say that a 'character' is >> exactly one byte? >> How do you infer that a 'character', in ANSI mode, can occupy two bytes? > >Very very true. In the documentation of StringCchPrintf, MSDN correctly >refrains from saying that a 'character' is exactly one byte. Microsoft is >well aware that code page 932 (Shift-JIS) and the code page for the world's >largest country by population and a couple of other code pages contain >characters that, in ANSI mode, occupy two bytes. Dr. Newcomer, I think you >are well aware of this too, and I am really confused why you ask these >questions. > >Meanwhile, this is still the reason why, if MSDN's documentation is correct, >buffer overflow can still occur. A caller of the ANSI version can have a >buffer 2 bytes long, long enough for 1 single-byte character plus 1 >single-byte null character, and say that its buffer length is 2. But >StringCchPrintf, if it behaves as documented, will copy in 1 character no >matter how many bytes it requires, plus 1 single-byte null character. If >the first character occupies two bytes then the null character goes into the >third byte of the two-byte buffer. > >I don't know where the discussion of UTF-8 came from but I'm not joining it, >at least not for the moment. > > >"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message >news:pbpuc2lq2o7e2ca2pi9opis83ubrilg4vp(a)4ax.com... >> The documentation for StringCchPrintf talks about counts of characters. >> In the ANSI >> compilation, each character occupies exactly one TCHAR. I'm not sure how >> you figure a >> character can occupy two TCHARs (which are just chars in ANSI) since each >> char has a value >> of exactly the range 0..255, which fits in exactly one char. >> >> The documentation for StringCchPrintf says >> ============================================ >> StringCchPrintf Function >> >> StringCchPrintf is a replacement for sprintf. It accepts a format string >> and a list of >> arguments and returns a formatted string. The size, in characters, of the >> destination >> buffer is provided to the function to ensure that StringCchPrintf does not >> write past the >> end of this buffer. >> >> Syntax >> >> HRESULT StringCchPrintf( >> LPTSTR pszDest, >> size_t cchDest, >> LPCTSTR pszFormat, >> ... >> ); >> Parameters >> >> pszDest >> [out] Pointer to a buffer which receives the formatted, null-terminated >> string created >> from pszFormat and its arguments. >> cchDest >> |