From: Joseph M. Newcomer on 9 Apr 2010 11:37 See below... On Thu, 8 Apr 2010 20:48:19 -0500, "Liviu" <lab2k1(a)gmail.c0m> wrote: >"Alexander Grigoriev" <alegr(a)earthlink.net> wrote... >> >> See windows.h: >> >> #define DeleteFile DeleteFileA >> >> By the way, it doesn't make sense to do non-UNICODE software >> anymore, which yours seems. > >If you meant strictly that DeleteFileA doesn't make sense then I'll >agree (same applies to most other Win32 API calls for that matter). >And if you meant that apps need be Unicode aware then I'll agree, too. > >But the blanket statement that non-UNICODE software should be shunned >is too strong IMHO. There still are cases where it makes a lot of sense >to build non-Unicode (i.e. no _UNICODE or UNICODE #define'd) and >explicity call the "W" flavor of the API functions where applicable. One >example at random, an application which stores text as UTF-8 internally, >and does more string manipulations inside than API calls outside. **** What I tell my students: "Someday, your boss is going to drop in and say 'We just landed a big contract in <Japan, Korea, China>. How soon can you have the software ready to run there" Which answer do you want to give (a) Give me a couple days, and I'll be able to give you an estimate. Three to six weeks, probably (b) How soon can you get the translator in here? The reality is that we live in a very global market, and anyone who knows their company product will never need Unicode is living in a fantasy world. And if you thnk UTF-8 makes it easy, you have never actually had to deal with a language that used a multibyte encoding in UTF-8. It is NOT easy, and is highly error-prone. I know, I've had to clean up some of the messes that have resulted from this. First thing I do is convert to Unicode. Then REMOVE all the erroneous code that was failing to deal with UTF-8 or MBCS properly. If I use UTF-8, it is at the boundaries (write UTF-8 files, read them, use it across network packets, etc.) joe **** > >Liviu > Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on 9 Apr 2010 11:41 See below... On Fri, 9 Apr 2010 05:23:33 -0700 (PDT), Goran <goran.pusic(a)gmail.com> wrote: >On Apr 9, 3:48�am, "Liviu" <lab...(a)gmail.c0m> wrote: >> "Alexander Grigoriev" <al...(a)earthlink.net> wrote... >> >> > See windows.h: >> >> > #define DeleteFile DeleteFileA >> >> > By the way, it doesn't make sense to do non-UNICODE software >> > anymore, which yours seems. >> >> If you meant strictly that DeleteFileA doesn't make sense then I'll >> agree (same applies to most other Win32 API calls for that matter). >> And if you meant that apps need be Unicode aware then I'll agree, too. >> >> But the blanket statement that non-UNICODE software should be shunned >> is too strong IMHO. There still are cases where it makes a lot of sense >> to build non-Unicode (i.e. no _UNICODE or UNICODE #define'd) and >> explicity call the "W" flavor of the API functions where applicable. One >> example at random, an application which stores text as UTF-8 internally, >> and does more string manipulations inside than API calls outside. > >Passing that UTF-8 string to DeleteFile (that is, "A" version will be >picked by the compiler) is quite likely to lead to trouble, and a +/- >subtle one (system presumes "ANSI" encoding, whereas string passed in >ain't that). So, conversion from UTF-8 to corresponding code page is >both restrictive and necessary. **** It will be very unpleasant. I've seen cases where people are passing UTF-8-encoded filenames to the CreateFIleA API and wondering why the filenames "look funny". **** > >So even if UTF-8 is used, it's better to define (_)UNICODE and to make >it a habit to go to UTF-16 before going to Win32. **** Win32 API does not understand UTF-8 (it is not clear how well it works with UTF-16 with surrogates, either!). But for a large class of languages, it works well in UTF-16. The only question is when there is going to be the massive shift to UTF-32 encoding, now that Unicode has changed (when Windows was designed, Unicode did not actually use any encoding larger than 16 bits) joe **** > >Goran. Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Giovanni Dicanio on 9 Apr 2010 18:19 "Goran" <goran.pusic(a)gmail.com> ha scritto nel messaggio news:9891b928-3a77-4738-aedb-0a6671fd1656(a)11g2000yqr.googlegroups.com... > BTW, I believe that Spolsky is right on Hungarian in > http://www.joelonsoftware.com/articles/Wrong.html Very interesting article, thanks! I agree that Hungarian can be of a good help when it identifies the *intent* (more than the underlying raw type). For example, both BSTR and LPWSTR are wchar_t*, and the following code would compile just fine: LPWSTR s1; BSTR s2; ... s2 = s1; But it is *wrong*, because BSTR's must be allocated using their own functions like SysAllocString, etc. Using a proper prefix convention helps identifying the problem (especially when the offending code is sorrounded by other code): LPWSTR pszS1; BSTR bstrS2; ... bstrS2 = pszS1; --> wrong: can't assign a BSTR from raw LPWSTR. Another good example is the consistent use of 'cb' vs 'cch'. Or prefixing pointers with 'p'... WCHAR szBuf[100]; ... LPWSTR pszBuf; ... StringCbCopy(..., sizeof(pszBuf) ... ) Reading sizeof(*p*szBuf) alerts the brain, because the size of a *p*ointer is 4, and probably what was needed was the size of the statically-allocated buffer (i.e. sizeof(szBuf), without the leading 'p'). And I like the cancelation rule of 'p' and '*'. For example: BYTE **ppb *ppb is a BYTE*, because the first * and the first 'p' cancel together, so 'pb' remains (and it is a single-level pointer). > P.S. "m_"!? Puh-lease! Ok, I agree that it's interesting to prefix > class data members, but what's wrong with a simple "_", or "F" (for > "field", as Borland does)? I like the 'm_' prefix, probably because I'm used to it, and to me it is consistent with g_ and s_ prefixes I use for global variables and static variables. (Yes, I know that modern IDEs use tooltips to give information about an identifier, but I just find it more readable having it standing, speaking for itself.) However, all in all, I do like David's words: > Actually, it's more important to be consistent with the other code you're > working with, because jumping from one convention to another while stepping > into other people's code in the debugger or editor is more jarring than > staying with one non-optimal convention. Giovanni
From: Liviu on 9 Apr 2010 18:55 "Joseph M. Newcomer" <newcomer(a)flounder.com> wrote... > On Thu, 8 Apr 2010 20:48:19 -0500, "Liviu" <lab2k1(a)gmail.c0m> wrote: > >> There still are cases where it makes a lot of sense to build >> non-Unicode (i.e. no _UNICODE or UNICODE #define'd) >> and explicity call the "W" flavor of the API functions where >> applicable. One example at random, an application which stores >> text as UTF-8 internally, and does more string manipulations >> inside than API calls outside. > **** > > The reality is that we live in a very global market, and anyone who > knows their company product will never need Unicode is living in a > fantasy world. Right. To be very clear, my point was not against Unicode as the standard, but only the implementation choice of a particular encoding such as UTF-8 or UTF-16 for internal representation. > And if you thnk UTF-8 makes it easy, you have never actually had > to deal with a language that used a multibyte encoding in UTF-8. > It is NOT easy, and is highly error-prone. I know, I've had to clean > up some of the messes that have resulted from this. First thing I do > is convert to Unicode. Then [...] Then the app fails to run because its VM size suddenly shot over 2GB, and now you have to rebuild it for 64b, and require your clients to upgrade their windows as well ;-) Mostly joking, of course... But as long as the code does meet the Unicode requirements, the choice for its internal encoding is an engineering decision more than a design one. Liviu
From: Joseph M. Newcomer on 9 Apr 2010 23:32
On Sat, 10 Apr 2010 00:19:11 +0200, "Giovanni Dicanio" <giovanniDOTdicanio(a)REMOVEMEgmail.com> wrote: >"Goran" <goran.pusic(a)gmail.com> ha scritto nel messaggio >news:9891b928-3a77-4738-aedb-0a6671fd1656(a)11g2000yqr.googlegroups.com... > >> BTW, I believe that Spolsky is right on Hungarian in >> http://www.joelonsoftware.com/articles/Wrong.html > **** Actually, this "right" way is not particularly right. The REALLY right way is to define a string class SafeString, and only allow operations on SafeString variables. By using the name instead of the type, you allow sData = usData; to compile, when in fact you don't want this to jump out visually to you at all! You want the code to NOT COMPILE! I once worked in a language that had nice features, and I wrote graphics code with two data types, Xint and Yint, which represented the coordinate system. So you couldn't accidentally store an X-coordinate in a Y-coordinate variable. And when I did this, I found someone had written a subroutine that took the Y coordinate FIRST (stupid design decision, not mine, either) and so the program stopped compiling! THAT'S what you want from notations. Use the type system, not the programmer, to detect these kinds of errors! Microsoft has been abysmal at this; the fact that LPWSTR and BSTR could cross-assign is a total design failure of BSTR. Don't use a kludge to fix a very real problem; use types. The problem was that C programmers think C++ is just C with funny syntax, and it is not. It is a new language, and it should be used to its full power. Creating silly names like "s"-prefix variables and "us"-prefix variables says "I'm an incompetent C++ programmer, using some lazy solution instead of building the right one!". You can do the same thing in C, using structs (it is clumsy, but I've done it). On large projects, a bit of investment in infrastructure pays off in higher productivty, reliability, and robustness. DO THE JOB RIGHT! USE TYPES! I haven't worked on a project larger than one person in twenty years, and I still create funky types to deal with places where I care about type safety. If I were back working on 20-person teams, I'd be spending a LOT of time doing this (back when I was, I did). Note that we could NOT assign literals to X or Y coordinates, if you needed a (0,0) coordinate, you had to write the equivalent of Coordinate origin(Xint ! 0, Yint ! 0); "!" was the casting operator, and a casting operator was defined for the type so you could limit what appeared on the RHS of the infix operator. So it was essentially class Xint { .... stuff Xint operator ! (int v) { return reinterpret_cast<XInt>(v); } } Also, we didn't have to define every operator because we could derive from a higher level and not end up allowing arguments of the subclass to match parameters of the superclass (it had an amazing type system, the best I've ever worked in). except that isn't quite the syntax of that language that I used.. But it conveys the idea. **** >Very interesting article, thanks! > >I agree that Hungarian can be of a good help when it identifies the *intent* >(more than the underlying raw type). >For example, both BSTR and LPWSTR are wchar_t*, and the following code would >compile just fine: > > LPWSTR s1; > BSTR s2; > ... > s2 = s1; > >But it is *wrong*, because BSTR's must be allocated using their own >functions like SysAllocString, etc. > >Using a proper prefix convention helps identifying the problem (especially >when the offending code is sorrounded by other code): > > LPWSTR pszS1; > BSTR bstrS2; > ... > bstrS2 = pszS1; > >--> wrong: can't assign a BSTR from raw LPWSTR. > >Another good example is the consistent use of 'cb' vs 'cch'. > >Or prefixing pointers with 'p'... > > WCHAR szBuf[100]; > ... > LPWSTR pszBuf; > ... > > StringCbCopy(..., sizeof(pszBuf) ... ) > >Reading sizeof(*p*szBuf) alerts the brain, because the size of a *p*ointer >is 4, and probably what was needed was the size of the statically-allocated >buffer (i.e. sizeof(szBuf), without the leading 'p'). > >And I like the cancelation rule of 'p' and '*'. >For example: > > BYTE **ppb > >*ppb is a BYTE*, because the first * and the first 'p' cancel together, so >'pb' remains (and it is a single-level pointer). > > >> P.S. "m_"!? Puh-lease! Ok, I agree that it's interesting to prefix >> class data members, but what's wrong with a simple "_", or "F" (for >> "field", as Borland does)? > >I like the 'm_' prefix, probably because I'm used to it, and to me it is >consistent with g_ and s_ prefixes I use for global variables and static >variables. **** I find the m_ prefix offensive in that it demands a purely arbitrary convention that cannot be enforced by the language or compiler. Since I do not think it adds anything to the program readability, and in fact is completely noise most of the time, I have not used it in years. I only used it in two programs. The first two MFC programs I wrote. **** > >(Yes, I know that modern IDEs use tooltips to give information about an >identifier, but I just find it more readable having it standing, speaking >for itself.) > >However, all in all, I do like David's words: > > > Actually, it's more important to be consistent with the other code you're > > working with, because jumping from one convention to another while >stepping > > into other people's code in the debugger or editor is more jarring than > > staying with one non-optimal convention. > > >Giovanni > > Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm |