Prev: Love Potion for Miss Blandish
Next: Newcomer's CAsyncSocket example: trouble connecting with other clients
From: Peter Olcott on 20 May 2010 10:51 On 5/20/2010 2:17 AM, Mihai N. wrote: > >> I though that it choked on anything besides ASCII. So are you implying >> that it can take Unicode within any encoding? > > Can take Unicode in some Unicode form. > It can take the Unicode form accepted by the compiler. > Some compilers understand UTF-16, some understand UTF-8, > some understand none. > But even for the the ones that don't understand anything than ASCII, > they should still accept escaped form (\uXXXX) > > int x\u6565rap = 123; > is a perfectly valid name. > > (and if the compiler accepts utf-8 or utf-16, you can use some > human readable form) > > > Ah so my idea to allow UTF-8 encoded identifiers is really not all that bad.
From: Peter Olcott on 20 May 2010 11:24 On 5/20/2010 2:39 AM, Joseph M. Newcomer wrote: > Note that an identifier is defined as incorporating "other implementation-defined > characters". If someone is claiming to extend C syntax to include localized letters, then > it should be philosophically consistent with the localized environment and define letters > to be consistent with that environment, or alternatively, be inclusive and include all > letters in all localized environments. Letters in a localized environment would not > include digits in a localized environment, punctuation marks of a localized environment, > etc. > It has already taken me 12 years (since 1998) and I still don't have a product. If I take the time to do learn about and do all of these little thing, I will be dead before I am done. > Peter makes one of the common mistakes he is so fond of: he fastens on ONE implementation > by ONE vendor and makes a claim that it is DEFINITIVE. You can't even argue that Intel's > C++ compiler or gcc "prove" that this is true for ALL compilers, since they are intended > to be clones of each other and historically they all date back to the PDP-11 C++ compiler > which only used ASCII-7, so they are clones of that, except for extending the syntax to > more modern constructs. So he comes along and says "I'm going to extend this" and as soon > as I point out that the extensions have serious problems, he says "but the regular C++ > language does not work that way!" which seems to beg the question of what is meant by > creating an extension that meets the requirements of allowing "native language". There > are interesting questions about accent marks, vowel marks, combining characters, localized > punctuation, localized digits, etc., but when I raised these, I was informed that the > extensions to support "native language coding" did NOT mean "support native language > coding" but meant "support something that allows native language programmers to write > identifiers in their native language that don't even make sense lexically in the native > language", and while making claims about how fast the recognizer is, refuse to limit the > productions because the copy-and-paste lex rules would actually require WORK to make them > correct, so he argues that it is not "convenient" to do it right. I err on the side of abstracting out too many details, and you err on the side of including so many details that all of the development budget would be eaten up by the feasibility study. > I guess I don't respect doing a job wrong, and rationalizations that say "wrong is OK, > because whatever it is that I have defined is necessarily right, whether it is right or > not". There are some VERY interesting questions about combining accent marks and > combining characters, but if we ignore those, there is ZERO excuse for not writing > productions based on localized letters or digits (other than the copy-and-paste solution > no longer works!) because it cannot POSSIBLY affect the performance of the lexer! He even > says it can't, so the only remaining reason is the need to actually THINK about the > problem, instead of accepting an unsanctioned and unsupported regexp rule set If all that needs to be done is to map some local code points to some ASCII characters this may be implemented before I release my GUI scripting language. The grammatical productions would still be written using the ASCII character set. The lexical specification would map differing character sets to their corresponding ASCII equivalents. It is the rats nest of complexity of grapheme clusters that causes me to say whoa, too much, let's stop here. > Note that the lexical rules require that localized characters be mapped to the base > character set, so a thai digit character should map to the corresponding 0..9 value, and a > conforming compiler that allowed Thai input would do so because the C++ standard requires > that it do so. So his argument about why his extended C++ does not have to treat a > localized comma as a comma or a localized semicolon as a semicolon does not make sense; > the standard says that the input character set is implementation-specific but must map to > the base character set, That sounds reasonable and relatively easy. > so the argument that if I treat the following sequence in some > language "A,B" that if I use a localized comma with localized letters this is, by his > rules, necessarily an identifier. It means that under the mapping requirements it does > not translate and therefore his assertion is (no big surprise here) gibberish. Now you have finally explained your view sufficiently so that I can see what you are saying makes perfect sense. > But why are we arguing over this? We KNOW his design is wrong; only HE can defend his bad > decisions by rationalizing them to himself. The first time a customer programmer > complains "But you SAID your extensions supported UTF-8 input, and I wrote this code in my > native language and it is correct" he can explain to a PAYING CUSTOMER why his > implementation makes no sense. I wish you would have explained it this well earlier on it would have avoided a lot of wasted time. This is still probably out of scope for my first release. My first release will probably only support ASCII. The idea of home grown capitol is to get some sales quickly and as these sales provide self sufficient positive cash flow, then proceed with additional development. > > Note that you should also cite section 2.3 and the footnotes on page 16. > joe > > On Thu, 20 May 2010 00:59:44 -0400, "Pete Delgado"<Peter.Delgado(a)NoSpam.com> wrote: > >> >> "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote in message >> news:gomdnfY9-INibm7WnZ2dnUVZ_sEAAAAA(a)giganews.com... >>> On 5/19/2010 12:55 AM, Mihai N. wrote: >>>> >>>>> So C++ can take UTF-8 Identifiers? >>>> >>>> No, it can take Unicode identifiers. >>>> The exact transformation format is not relevant. >>>> >>>> >>> I though that it choked on anything besides ASCII. So are you implying >>> that it can take Unicode within any encoding? >> >> *Read* the C++ standards documents. It explains *everything*. For >> information about identifiers, see section 2.11.There are draft copies of >> the current proposed standard available for free : >> >> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf >> >> There is no need to "imply" anything. As usual, Mihai is correct in matters >> such as this and your "thought" was wrong. >> >> -Pete >> > Joseph M. Newcomer [MVP] > email: newcomer(a)flounder.com > Web: http://www.flounder.com > MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on 20 May 2010 12:38 On Thu, 20 May 2010 10:24:13 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote: >On 5/20/2010 2:39 AM, Joseph M. Newcomer wrote: >> Note that an identifier is defined as incorporating "other implementation-defined >> characters". If someone is claiming to extend C syntax to include localized letters, then >> it should be philosophically consistent with the localized environment and define letters >> to be consistent with that environment, or alternatively, be inclusive and include all >> letters in all localized environments. Letters in a localized environment would not >> include digits in a localized environment, punctuation marks of a localized environment, >> etc. >> > >It has already taken me 12 years (since 1998) and I still don't have a >product. If I take the time to do learn about and do all of these little >thing, I will be dead before I am done. **** How is it that other people have managed to gain knowledge of lots of topics, such as virtual memory, threading, computer architectures, etc. and are still alive? I'm pretty good, but people like Nigel Horspool are incredibly knowledgeable, and he's younger than I am. In fact, I hang out with a group of incredibly knowledgeable people, more than half of whom are much younger than I am, and they have not had any problem learning new things very quickly. IFIP working group 2.4. See http://wg24.cs.uvic.ca/ContentWG24.shtml . **** > >> Peter makes one of the common mistakes he is so fond of: he fastens on ONE implementation >> by ONE vendor and makes a claim that it is DEFINITIVE. You can't even argue that Intel's >> C++ compiler or gcc "prove" that this is true for ALL compilers, since they are intended >> to be clones of each other and historically they all date back to the PDP-11 C++ compiler >> which only used ASCII-7, so they are clones of that, except for extending the syntax to >> more modern constructs. So he comes along and says "I'm going to extend this" and as soon >> as I point out that the extensions have serious problems, he says "but the regular C++ >> language does not work that way!" which seems to beg the question of what is meant by >> creating an extension that meets the requirements of allowing "native language". There >> are interesting questions about accent marks, vowel marks, combining characters, localized >> punctuation, localized digits, etc., but when I raised these, I was informed that the >> extensions to support "native language coding" did NOT mean "support native language >> coding" but meant "support something that allows native language programmers to write >> identifiers in their native language that don't even make sense lexically in the native >> language", and while making claims about how fast the recognizer is, refuse to limit the >> productions because the copy-and-paste lex rules would actually require WORK to make them >> correct, so he argues that it is not "convenient" to do it right. > >I err on the side of abstracting out too many details, and you err on >the side of including so many details that all of the development budget >would be eaten up by the feasibility study. **** What "feasibility study"? Why would this even enter the discussion? The "feasibility" is to look at the Unicode character set and write a set of lex productions that include only those sequences called "letters". An analogous study of "digits" can allow writing lex rules about "digits". There. That's the "feasibility study"; the result: it is trivial to do. So where is the problem? How many people need to form a committee, and are they producing a printed report to management, and what is their timeline for this complex study? Oh, I missed the fact that a one-person project comes with a built-in bureaucracy. **** > >> I guess I don't respect doing a job wrong, and rationalizations that say "wrong is OK, >> because whatever it is that I have defined is necessarily right, whether it is right or >> not". There are some VERY interesting questions about combining accent marks and >> combining characters, but if we ignore those, there is ZERO excuse for not writing >> productions based on localized letters or digits (other than the copy-and-paste solution >> no longer works!) because it cannot POSSIBLY affect the performance of the lexer! He even >> says it can't, so the only remaining reason is the need to actually THINK about the >> problem, instead of accepting an unsanctioned and unsupported regexp rule set > >If all that needs to be done is to map some local code points to some >ASCII characters this may be implemented before I release my GUI >scripting language. **** But why did you have to argue about this? **** > >The grammatical productions would still be written using the ASCII >character set. The lexical specification would map differing character >sets to their corresponding ASCII equivalents. **** If you believe that convert-to-base-language argument, yes, and you need to read the C++ standard for what is meant by the "base character set". I did, it took me perhaps 10 minutes. I have no idea how long it would take your feasibility study committee. **** > >It is the rats nest of complexity of grapheme clusters that causes me to >say whoa, too much, let's stop here. **** Yes, and you can make some statements about that. For example, I've been told that in Hebrew that vowel marks are considered redundant, so you might tackle such issues incrementally as you get feedback from various users. But this is not the same as saying "everything that is not ASCII-7 is a letter". **** > > >> Note that the lexical rules require that localized characters be mapped to the base >> character set, so a thai digit character should map to the corresponding 0..9 value, and a >> conforming compiler that allowed Thai input would do so because the C++ standard requires >> that it do so. So his argument about why his extended C++ does not have to treat a >> localized comma as a comma or a localized semicolon as a semicolon does not make sense; >> the standard says that the input character set is implementation-specific but must map to >> the base character set, > >That sounds reasonable and relatively easy. **** Yes, and if you had taken the ten minutes to read the standard, you would have realized that, too! **** > > > so the argument that if I treat the following sequence in some >> language "A,B" that if I use a localized comma with localized letters this is, by his >> rules, necessarily an identifier. It means that under the mapping requirements it does >> not translate and therefore his assertion is (no big surprise here) gibberish. > >Now you have finally explained your view sufficiently so that I can see >what you are saying makes perfect sense. **** But I said this on day one, and it was so screamingly obvious I don't know why you didn't get the AHA! event then! **** > >> But why are we arguing over this? We KNOW his design is wrong; only HE can defend his bad >> decisions by rationalizing them to himself. The first time a customer programmer >> complains "But you SAID your extensions supported UTF-8 input, and I wrote this code in my >> native language and it is correct" he can explain to a PAYING CUSTOMER why his >> implementation makes no sense. > >I wish you would have explained it this well earlier on it would have >avoided a lot of wasted time. **** I leave a certain amount of work as an Exercise For The Reader. I don't feel I have to explain every single little detail to an experienced programmer. **** > >This is still probably out of scope for my first release. My first >release will probably only support ASCII. The idea of home grown capitol **** For someone who accuses me of understanding nothing because I make a couple typos, you should be a lot more careful about your own typos. "Capital" is, by one definition, money to invest; "capitol" is a building in which certain legislative bodies meet. Be careful, or someone may state that because you are the sort of person who cannot spell-check that you are necessarily a babbling idiot. **** >is to get some sales quickly and as these sales provide self sufficient >positive cash flow, then proceed with additional development. **** But you must not misrepresent the product in the way you have. joe **** > >> >> Note that you should also cite section 2.3 and the footnotes on page 16. >> joe >> >> On Thu, 20 May 2010 00:59:44 -0400, "Pete Delgado"<Peter.Delgado(a)NoSpam.com> wrote: >> >>> >>> "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote in message >>> news:gomdnfY9-INibm7WnZ2dnUVZ_sEAAAAA(a)giganews.com... >>>> On 5/19/2010 12:55 AM, Mihai N. wrote: >>>>> >>>>>> So C++ can take UTF-8 Identifiers? >>>>> >>>>> No, it can take Unicode identifiers. >>>>> The exact transformation format is not relevant. >>>>> >>>>> >>>> I though that it choked on anything besides ASCII. So are you implying >>>> that it can take Unicode within any encoding? >>> >>> *Read* the C++ standards documents. It explains *everything*. For >>> information about identifiers, see section 2.11.There are draft copies of >>> the current proposed standard available for free : >>> >>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf >>> >>> There is no need to "imply" anything. As usual, Mihai is correct in matters >>> such as this and your "thought" was wrong. >>> >>> -Pete >>> >> Joseph M. Newcomer [MVP] >> email: newcomer(a)flounder.com >> Web: http://www.flounder.com >> MVP Tips: http://www.flounder.com/mvp_tips.htm Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Peter Olcott on 20 May 2010 13:01 On 5/20/2010 11:38 AM, Joseph M. Newcomer wrote: > > > On Thu, 20 May 2010 10:24:13 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote: > >> If all that needs to be done is to map some local code points to some >> ASCII characters this may be implemented before I release my GUI >> scripting language. > **** > But why did you have to argue about this? To get you to explain the reasoning behind your dogmatic statements so that I could see that there was a reasonable basis for what you were claiming. >>> so the argument that if I treat the following sequence in some >>> language "A,B" that if I use a localized comma with localized letters this is, by his >>> rules, necessarily an identifier. It means that under the mapping requirements it does >>> not translate and therefore his assertion is (no big surprise here) gibberish. >> >> Now you have finally explained your view sufficiently so that I can see >> what you are saying makes perfect sense. > **** > But I said this on day one, and it was so screamingly obvious I don't know why you didn't > get the AHA! event then! It made no sense that there would be something such as a localized comma. Because it made no sense (and still makes no sense) I thought that you were just pulling my chain. Even if there was such a thing as a localized comma (and apparently there is) I thought that C/C++ standardized on the ASCII comma. I coined a term long ago [ignorance squared]. What this means is that there is no possible way for any person lacking knowledge to accurately quantify the specific degree of this lack of knowledge because this requires having the knowledge to measure the lack against. To a person whom lacks knowledge this lack can only appear to be disagreement. Only the person whom has the knowledge can accurately quantify the degree of the lack. Ignorance squared means that one is even ignorance of their own ignorance. (or at least the degree of this ignorance).
From: Joseph M. Newcomer on 20 May 2010 14:07
See below... On Thu, 20 May 2010 12:01:12 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote: >On 5/20/2010 11:38 AM, Joseph M. Newcomer wrote: >> >> >> On Thu, 20 May 2010 10:24:13 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote: >> >>> If all that needs to be done is to map some local code points to some >>> ASCII characters this may be implemented before I release my GUI >>> scripting language. >> **** >> But why did you have to argue about this? > >To get you to explain the reasoning behind your dogmatic statements so >that I could see that there was a reasonable basis for what you were >claiming. *** I made no dogmatic statements; I merely pointed out the screamingly obvious defects in the design. **** > >>>> so the argument that if I treat the following sequence in some >>>> language "A,B" that if I use a localized comma with localized letters this is, by his >>>> rules, necessarily an identifier. It means that under the mapping requirements it does >>>> not translate and therefore his assertion is (no big surprise here) gibberish. >>> >>> Now you have finally explained your view sufficiently so that I can see >>> what you are saying makes perfect sense. >> **** >> But I said this on day one, and it was so screamingly obvious I don't know why you didn't >> get the AHA! event then! > >It made no sense that there would be something such as a localized >comma. Because it made no sense (and still makes no sense) I thought >that you were just pulling my chain. Even if there was such a thing as a >localized comma (and apparently there is) I thought that C/C++ >standardized on the ASCII comma. **** That is a stupid statement. All you had to do was read the Unicode standard and you would see that the ARE localized punctuation marks! I borught up the list of Unicode code points and it took me less than five minutes to discover this! I even gave you the precise code points, so you could not POSSIBLY have missed the idea that there are localized punctuation marks! And you could have verified my observations just by looking at the Unicode standard! RTFM!!!! And the C++ standard is very clear about what is going on; if there are character set transformations required to create a legitimate C++ program, these are handled by a mechanism outside the standard. And in any case, since you explicitly said you were EXTENDING the character set, why should you revert to insisting that the ASCII-7 standard is what a programmer programming in his or her "native language" should adhere to. I merely pointed out an obvious inconsistency in your reasoning. You reverted to saying "I said X, but I meant something other than X" which puts us back in the world of the Magic Morphing Requirements. *** > >I coined a term long ago [ignorance squared]. What this means is that >there is no possible way for any person lacking knowledge to accurately >quantify the specific degree of this lack of knowledge because this >requires having the knowledge to measure the lack against. **** Hmm. But I had attempted to correct your ignorance, in particular, I remember specifically giving the Unicode code point for the Armenian Comma and several other localized punctuation marks, and you made the assumption I was "yanking your chain". Now THAT's a manifestation of ignorance squared! When someone corrects you by stating a fact, and you find the fact "inconvenient", it does not make you smarter; it only proves that you like remaining ignorant. **** > >To a person whom lacks knowledge this lack can only appear to be >disagreement. Only the person whom has the knowledge can accurately >quantify the degree of the lack. **** I had done that, by pointing out ranges of localized digits, and localized punctuation marks, and you chose to both ignore me and argue that such things didn't matter, which was inconsistent with your stated design goal (program in the localized language). I even pointed out that I had used my Locale Explorer to find these, and it is a free download (and the table I use in it is directly from the Unicode Web site, and is the official, sanctioned, data, at least as of the time I downloaded it; it is potentially obsolete, but it already contained enough information to show you were wrong) **** > >Ignorance squared means that one is even ignorance of their own >ignorance. (or at least the degree of this ignorance). *** So what do you call an insistence on remaining ignorant, even when others are supplying knowledge you didn't have? joe **** Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm |