From: Joseph M. Newcomer on 9 Apr 2010 23:51 See below... On Fri, 9 Apr 2010 17:55:35 -0500, "Liviu" <lab2k1(a)gmail.c0m> wrote: >"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote... >> On Thu, 8 Apr 2010 20:48:19 -0500, "Liviu" <lab2k1(a)gmail.c0m> wrote: >> >>> There still are cases where it makes a lot of sense to build >>> non-Unicode (i.e. no _UNICODE or UNICODE #define'd) >>> and explicity call the "W" flavor of the API functions where >>> applicable. One example at random, an application which stores >>> text as UTF-8 internally, and does more string manipulations >>> inside than API calls outside. >> **** >> >> The reality is that we live in a very global market, and anyone who >> knows their company product will never need Unicode is living in a >> fantasy world. > >Right. To be very clear, my point was not against Unicode as the >standard, but only the implementation choice of a particular encoding >such as UTF-8 or UTF-16 for internal representation. > >> And if you thnk UTF-8 makes it easy, you have never actually had >> to deal with a language that used a multibyte encoding in UTF-8. >> It is NOT easy, and is highly error-prone. I know, I've had to clean >> up some of the messes that have resulted from this. First thing I do >> is convert to Unicode. Then [...] > >Then the app fails to run because its VM size suddenly shot over 2GB, >and now you have to rebuild it for 64b, and require your clients to >upgrade their windows as well ;-) Mostly joking, of course... But as >long as the code does meet the Unicode requirements, the choice for >its internal encoding is an engineering decision more than a design one. **** If a minor change like moving to Unicode blows you out of the water like this, it suggests you were already living dangerously. Something that gratuitously generates a > 1GB memory footprint has deeper problems than Unicode. Memory footprints this large inevitably have horrible performance characteristics, and these need to be solved. Some years ago, I had to deal with a WIn16 program that had a 7.5MB footprint on a 2MB machine (remember when we could run productive code on a 2MB machine?) Once I replaced all the fixed-size arrays of characters with dynamically-allocated strings, and the fixed-size-arrays-of-structs-with-fixed-sized-arrays-of-things-inside-them with dynamically-allocated objects, in 3 days I recoded this C program to have a 300K footprint. And we stopped, because that was Good Enough. But if converting to Unicode blows your memory size, I think there are deeper problems that need to be addressed, ones which will result in substantially smaller memory footprints if addressed properly. I learned this programming on small machines (64K to 1MB machines). Programs whose size grows usually represent less-than-optimal designs; at some level, the small memory imposes ridiculous design decisions (I built several "overlay" systems to handle large code bases, and they were a real pain to deal with; and building virtual memory systems that kept strings in mapped files was also a pain) but once we start complaining that 1GB or 2GB is too small, either the design is wrong, or you really SHOULD have a 64-bit machine (my database friends LOVE Win64!) because the problem really is of that scale. If you are willing to pay the price in complexity for UTF-8, fine, but going into the problem you need to realize that you are trading off a LOT of coding complexity to keep the footprint small (most people who do this don't get the code right, and it breaks horribly every time you throw a multibyte character into the mix! Been there, fixed that) joe **** > >Liviu > Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on 10 Apr 2010 00:01 I once had a header file that had to be shared between an old Win16 app and a Win32 app. It defined a set of strings. For the Win16 app, I wrote ... = DoSomething(SYMBOLIC_NAME_OF_STRING_HERE) and for the Win32 app I wrote .... = DoSomethingA(SYMBOLIC_NAME_OF_STRING_HERE) There were some very odd reasons I could not write .... = DoSomething(_T(SYMBOLIC_NAME_OF_STRING_HERE)); that dealt with how the header files were shared and which code was shared and how, and I would not do it that way today. But it was code I wrote in 1996. It is one of the very few times I had to explicitly call an -A or -W suffix API. joe On Fri, 9 Apr 2010 10:33:09 -0500, "Liviu" <lab2k1(a)gmail.c0m> wrote: > >"Goran" <goran.pusic(a)gmail.com> wrote... >> On Apr 9, 3:48 am, "Liviu" <lab...(a)gmail.c0m> wrote: >>> >>> But the blanket statement that non-UNICODE software should be >>> shunned is too strong IMHO. There still are cases where it makes >>> a lot of sense to build non-Unicode (i.e. no _UNICODE or UNICODE >>> #define'd) and explicity call the "W" flavor of the API functions >>> where applicable. One example at random, an application which >>> stores text as UTF-8 internally, and does more string manipulations >>> inside than API calls outside. >> >> Passing that UTF-8 string to DeleteFile (that is, "A" version will be >> picked by the compiler) is quite likely to lead to trouble, and a +/- >> subtle one (system presumes "ANSI" encoding, whereas string passed >> in ain't that). So, conversion from UTF-8 to corresponding code page >> is both restrictive and necessary. > >True. But you missed the part where I said 'explicity call the "W" >flavor of the API functions where applicable'. In this case it would >mean converting UTF-8 to UTF-16, then calling DeleteFileW. > >> So even if UTF-8 is used, it's better to define (_)UNICODE > >Then one would have to explicitly use the "A" functions (lsrtlenA) >and objects (CStringA) for internal UTF-8 manipulations. > >Liviu > Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Alexander Grigoriev on 10 Apr 2010 00:16 Thanks G_d for UTF-8 support in all major browsers. Now all those different webpage encodings can just go away. "Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message news:p4iur51d8pqca33r7kd7q3ncusi5ts43pl(a)4ax.com... > See below... > On Thu, 8 Apr 2010 20:48:19 -0500, "Liviu" <lab2k1(a)gmail.c0m> wrote: > >>"Alexander Grigoriev" <alegr(a)earthlink.net> wrote... >>> >>> See windows.h: >>> >>> #define DeleteFile DeleteFileA >>> >>> By the way, it doesn't make sense to do non-UNICODE software >>> anymore, which yours seems. >> >>If you meant strictly that DeleteFileA doesn't make sense then I'll >>agree (same applies to most other Win32 API calls for that matter). >>And if you meant that apps need be Unicode aware then I'll agree, too. >> >>But the blanket statement that non-UNICODE software should be shunned >>is too strong IMHO. There still are cases where it makes a lot of sense >>to build non-Unicode (i.e. no _UNICODE or UNICODE #define'd) and >>explicity call the "W" flavor of the API functions where applicable. One >>example at random, an application which stores text as UTF-8 internally, >>and does more string manipulations inside than API calls outside. > **** > What I tell my students: > > "Someday, your boss is going to drop in and say 'We just landed a big > contract in <Japan, > Korea, China>. How soon can you have the software ready to run there" > > Which answer do you want to give > (a) Give me a couple days, and I'll be able to give you an estimate. > Three to six > weeks, probably > (b) How soon can you get the translator in here? > > The reality is that we live in a very global market, and anyone who knows > their company > product will never need Unicode is living in a fantasy world. > > And if you thnk UTF-8 makes it easy, you have never actually had to deal > with a language > that used a multibyte encoding in UTF-8. It is NOT easy, and is highly > error-prone. I > know, I've had to clean up some of the messes that have resulted from > this. First thing I > do is convert to Unicode. Then REMOVE all the erroneous code that was > failing to deal > with UTF-8 or MBCS properly. If I use UTF-8, it is at the boundaries > (write UTF-8 files, > read them, use it across network packets, etc.) > joe > **** >> >>Liviu >> > Joseph M. Newcomer [MVP] > email: newcomer(a)flounder.com > Web: http://www.flounder.com > MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Liviu on 10 Apr 2010 01:49 "Joseph M. Newcomer" <newcomer(a)flounder.com> wrote... > On Fri, 9 Apr 2010 17:55:35 -0500, "Liviu" <lab2k1(a)gmail.c0m> wrote: >> >>Then the app fails to run because its VM size suddenly shot over 2GB, >>and now you have to rebuild it for 64b, and require your clients to >>upgrade their windows as well ;-) Mostly joking, of course... But as >>long as the code does meet the Unicode requirements, the choice for >>its internal encoding is an engineering decision more than a design >>one. > **** > If a minor change like moving to Unicode blows you out of the water > like this, it suggests you were already living dangerously. Something > that gratuitously generates a > 1GB memory footprint has deeper > problems than Unicode. Memory footprints this large inevitably have > horrible performance characteristics, and these need to be solved. I resent the "gratuitously" insinuation ;-) I've grown up on scarce resource systems decades ago and still remember what decus/hisoft/aztec meant as far as C goes on platforms long forgotten now. But that's way off-topic... Case in point was not about wasteful design or even "near the edge" living. Just that (few, but some) users get caught in the ever-bigger hyperbole and push the limits just for the fun of it. They can understand that an (artificial) test case many times larger than what was billed or sold would run slower, but if they get an "out of memory" error they would still complain that their new machine has plenty of memory available. Even those who run 32b XP on 8GB ;-) > once we start complaining that 1GB or 2GB is too small, either the > design is wrong, or you really SHOULD have a 64-bit machine The latter. But that's an idea far easier to sell today than, say, 5 yrs ago. Of course, tomorrow we could have a similar argument about UTF-32 encoding. I actually wonder whether that's in MS's books. Liviu
From: Pete Delgado on 12 Apr 2010 13:21
"David Ching" <dc(a)remove-this.dcsoft.com> wrote in message news:%23MKaTb$1KHA.1036(a)TK2MSFTNGP06.phx.gbl... > "Goran" <goran.pusic(a)gmail.com> wrote in message > news:9891b928-3a77-4738-aedb-0a6671fd1656(a)11g2000yqr.googlegroups.com... >> P.S. "m_"!? Puh-lease! Ok, I agree that it's interesting to prefix >> class data members, but what's wrong with a simple "_", or "F" (for >> "field", as Borland does)? > > A leading '_' is reserved for compiler extensions, I believe. The C++ standard reserves the use of names within the global namespace that begin with an underscore to the implementation. This means that class members may indeed begin with an underscore because they are contained within the class namespace. -Pete |