Prev: Love Potion for Miss Blandish
Next: Newcomer's CAsyncSocket example: trouble connecting with other clients
From: Joseph M. Newcomer on 22 May 2010 06:03 See below... On Fri, 21 May 2010 14:43:07 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote: >On 5/21/2010 2:33 PM, Joseph M. Newcomer wrote: >> :-)!!!! And I can decode that even without looking up the actual codepoints! Yes, I've >> been seriously tempted, but as I said in the last tedious thread, I think I must suffer >> from OCD because I keep trying to educate him, in spite of his resistance to it! >> joe > >I did acknowledge that you did make your point as soon as you provided >me with enough reasoning to make your point. **** Sadly, all of this was so evident that I didn't see a need to keep drilling down when the correct issues were screamingly obvious. You should have been able to determine all of this on your own from my first responses. joe **** Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on 22 May 2010 06:08 See below... On Fri, 21 May 2010 14:55:27 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote: >On 5/21/2010 2:30 PM, Joseph M. Newcomer wrote: >> See below.... >> On Fri, 21 May 2010 09:59:50 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote: >> >>> C++ is apparently more restrictive than you thought because it requires >>> every input character to be mapped to the ASCII set. This would seem to >>> explicitly prohibit the flexibility that I provided of allowing UTF-8 >>> identifiers. >> **** >> OK, are you implementing C++ or are you implement an EXTENSION of C++ to allow native >> characters? If you are implementing an EXTENSION, then you get to decide what identifiers >> look like, but they should NOT look like sequences of arbitrary characters including >> punctuation marks! That is not sensible, and it is inconsistent with the stated goal! You >> don't need to read the C++ standard to know this! > >How would you go about making a language as international as you can >within a 40 hour budget? Assume that you only have novice levels of >understanding of Unicode and any learning must also be included in this >40 hour budget. **** What part of the budget is 40 hours? I could add the lexer rules in a few hours with careful reading of the Unicode standard. This would probably leave me 35 hours to deal with the finer points, the kind Mihai, Tom, and others who have worked deeply in other languages, might be able to point out. It ain't Rocket Science! **** > >Since my language would not treat any code point above ASCII as >lexically or syntactically significant, I still think that my approach >within my budget is optimal. **** Fine, but you mispresented what you were doing. So either your implementation doesn't meet your stated specification, or the stated specification was naively optimistic. But the implmentation clearly did not match it. **** > >What I learned from you is that if and when I do decide to map local >punctuation and digits to their corresponding ASCII equivalents, then I >would need to restrict the use of these remapped code points from being >used within identifiers. Until then it makes little difference. ***** Yes. But it makes a SIGNIFICANT difference if you tell me that I can use my native character set, and then you don't do that. **** > >I also learned from you that this next step of localization provides >much more functionality for relatively little cost. ***** Well, it means your implementation and your spefication are closer to each other... joe **** Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on 22 May 2010 06:16 See below... On Fri, 21 May 2010 15:23:25 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote: >On 5/21/2010 2:55 PM, Peter Olcott wrote: >> On 5/21/2010 2:30 PM, Joseph M. Newcomer wrote: >>> See below.... >>> On Fri, 21 May 2010 09:59:50 -0500, Peter >>> Olcott<NoSpam(a)OCR4Screen.com> wrote: >>> >>>> C++ is apparently more restrictive than you thought because it requires >>>> every input character to be mapped to the ASCII set. This would seem to >>>> explicitly prohibit the flexibility that I provided of allowing UTF-8 >>>> identifiers. >>> **** >>> OK, are you implementing C++ or are you implement an EXTENSION of C++ >>> to allow native >>> characters? If you are implementing an EXTENSION, then you get to >>> decide what identifiers >>> look like, but they should NOT look like sequences of arbitrary >>> characters including >>> punctuation marks! That is not sensible, and it is inconsistent with >>> the stated goal! You >>> don't need to read the C++ standard to know this! >> >> How would you go about making a language as international as you can >> within a 40 hour budget? > >It would probably take me much longer than 40 hours just to find the >exhaustive list of every local code point that must be mapped to an >ASCII code point. The whole rest of this adaptation would be nearly >trivial. **** Why do you care about ASCII code points? You explicitly said you are implementing an EXTENSION to C++ syntax, for a language which is NOT C++ but your private scripting language! So what in the world does the C++ specification have to do with your EXTENSION to the syntax???? If you say "I wish to ignore the limitations of the C++ language" and then you say "I am forced to do a bad implementation because I have to adhere to the limitations of the C++ language", how can we resolve these two positions? **** > > > Assume that you only have novice levels of >> understanding of Unicode and any learning must also be included in this >> 40 hour budget. ***** It does not take much experience to read the Unicode tables and see what are letters and what are digits and what are puctuation marks! And it does not take hours of study to do this! **** >> >> Since my language would not treat any code point above ASCII as >> lexically or syntactically significant, I still think that my approach >> within my budget is optimal. ***** Oh, what happened to that stated specification of allowing people to program in their native character set? Oh, that was just a Magic Morphing Requirement which is no longer true. Never mind. **** >> >> What I learned from you is that if and when I do decide to map local >> punctuation and digits to their corresponding ASCII equivalents, then I >> would need to restrict the use of these remapped code points from being >> used within identifiers. Until then it makes little difference. ***** But it is so trivial to do the job right in the first place! You treat anything recognizably called a "letter" as a letter, anything recognizably called a "digit" as a digit, write lexical rules for a number which has productions of the form thai_number = [0-9] (where 0-9 represent the code points for a thai number) chinese_number = [0-9] (where 0-9 represent the code poitns for a chinese number) english_Number = [0-9] (where 0-9 represent the code points \u0030 to \u0039) number = thai_number | chinese_number | english_number | ...lots of others... Note that converting a Chinese number to a binary representation is a bit trickier, because Chinese has a symbol for "ten", so you need to know the syntax for doing the conversion, but that's a trivial detail. That's what you worry about in the other 35 hours. joe **** >> >> I also learned from you that this next step of localization provides >> much more functionality for relatively little cost. Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Oliver Regenfelder on 22 May 2010 06:47 Hello, Peter Olcott wrote: > I did acknowledge that you did make your point as soon as you provided > me with enough reasoning to make your point. This may be so from your point of view. But I would say for most other people Joe made his point quite clear from the beginning! Just because it takes you long to understand doesn't mean that he didn't provide enough reason before. Best regads, Oliver
From: Peter Olcott on 22 May 2010 09:59
On 5/22/2010 5:03 AM, Joseph M. Newcomer wrote: > See below... > On Fri, 21 May 2010 14:43:07 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote: > >> On 5/21/2010 2:33 PM, Joseph M. Newcomer wrote: >>> :-)!!!! And I can decode that even without looking up the actual codepoints! Yes, I've >>> been seriously tempted, but as I said in the last tedious thread, I think I must suffer >>> from OCD because I keep trying to educate him, in spite of his resistance to it! >>> joe >> >> I did acknowledge that you did make your point as soon as you provided >> me with enough reasoning to make your point. > **** > Sadly, all of this was so evident that I didn't see a need to keep drilling down when the > correct issues were screamingly obvious. You should have been able to determine all of > this on your own from my first responses. > joe Within the context of the basic assumption (and I have already said this several times but you still don't get it) that C++ requires ASCII at the lexical level, everything that you said about how I was treating identifiers was utter nonsense gibberish. ONLY after this incorrect assumption was corrected could anything that you said about how I was treating identifiers make any sense at all. The ONLY reason that C++ does not allow any character in an identifier is that it would screw up the parser. If is would not screw up the parser then any character at all could be used in an identifier. It took you an enormous amount of time to explain why it would screw up the parser. You kept insisting upon arbitrary historical convention as your criterion for correct identifiers without pointing out how the parser would be screwed up. > **** > Joseph M. Newcomer [MVP] > email: newcomer(a)flounder.com > Web: http://www.flounder.com > MVP Tips: http://www.flounder.com/mvp_tips.htm |