Is this UTF-8 regular expression semantically correct? [MFC]

Prev: Love Potion for Miss Blandish
Next: Newcomer's CAsyncSocket example: trouble connecting with other clients

From: Peter Olcott on 18 May 2010 15:43

On 5/18/2010 11:26 AM, Joseph M. Newcomer wrote:
> See below...
> On Tue, 18 May 2010 01:10:07 -0700, "Mihai N."<nmihai_year_2000(a)yahoo.com> wrote:
>
>>
>>
>> Why not go to the root of the problem?
>>
>> This is what you need:
>> > For the purpose of creating an interpreted GUI scripting language that
>> > permits people to write GUI scripts in their native language
>>
>> Then expose the whole thing using a COM model, and it would allow
>> anyone to automate using any .NET language, Perl, JScript, you name it.
>> Solid languages, some of them supporting Unicode out of the box, way
>> more popular. You stop wasting your time developing a compiler,
>> and people will not be forces to waste time learning another
>> programming language (C-like but not quite C).
> ****
> But that sounds *reasonable*.
>
> Note that "permits people to write GUI scripts in their native language" but "all
> characters above the ASCII range" [which I presume means U007F] "are letters". Apparently,
> these languages do not have localized punctuation marks or digits, which is true only if
> you live deep in a Reality Distortion Field.

Anyone ever tell you that you are way too nit picky, paying very deep
attention to points that make no difference at all?

It is only a Letter in the sense that it is a valid character for an
Identifier. The original Lex specification for L was [A-Za-z_], and
underscore is not really a "Letter" either. For the purpose of valid "C"
language identifiers, an "_" underscore is a letter.

The "C" A REFERENCE MANUAL (FOURTH EDITION) bothers to make the
distinction between and "_" underscore and a Letter. Lex does not need
to know this distinction. As far as the "C" language us concerned an "_"
underscore is treated the same as a Letter, so no distinction need be made.

I could have called the range of code points above 7F to be named
something other than a letter but there was no need to since its makes
no relevant difference.

>
> In what language, exactly, is my use of the localized punctuation marks or digits
> considered part of the set of "letters". Presumably, if this were cast into the context
> of the 7-bit set, it would mean that I could have identifies "A,B", "A.B", "A;B" "01ABC",
> "3CAT" and so on. If my native language has a native comma, period, or semicolon, why is
> this considered a "letter"? Why is it I can start an identifier with a digit? Why is my
> native rendering of 12.34 considered an "identifier" and not a "number"? And localized
> digits? If I were doing this, I'd have productions that defined numeric sequences, e.g.,
> bengali_number, thai_number, etc. and then have a production that a number is an
> "ascii_number", "bengali_number", "thai_number", etc., but unfortunately that would merely
> make my implementation *correct*, rather than "small and fast" (this is the Unix mindset:
> it doesn't matter if it is right as long as it is small and fast).
>
> Of course, if you want to simplify the problem to make its implementation easy, and
> violate your own specification of using native languages, then making such nonsensical
> statements such as "all characters above the ASCII range are letters" is acceptable.
> Nonsesical, of course, but if you define nonsense away by saying "my implemention defines
> correctness, not my specification", then it is presumably OK.
> joe
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 18 May 2010 17:40

On 5/18/2010 4:34 PM, Hector Santos wrote:
> Peter Olcott wrote:
>
>>
>> Did you say something goofus?
>
> Yeah chump! I rather be a goofus than a moronic, ugly, stupid seriously
> sick and pathetic a-hole like you.
>
> Thanks for repeating the above so it gets more distribution.
>
> --
> HLS

No one is more unprofessional than Hector Santos.

From: Joseph M. Newcomer on 18 May 2010 18:40

See below...
On Tue, 18 May 2010 14:43:32 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote:

>On 5/18/2010 11:26 AM, Joseph M. Newcomer wrote:
>> See below...
>> On Tue, 18 May 2010 01:10:07 -0700, "Mihai N."<nmihai_year_2000(a)yahoo.com> wrote:
>>
>>>
>>>
>>> Why not go to the root of the problem?
>>>
>>> This is what you need:
>>> > For the purpose of creating an interpreted GUI scripting language that
>>> > permits people to write GUI scripts in their native language
>>>
>>> Then expose the whole thing using a COM model, and it would allow
>>> anyone to automate using any .NET language, Perl, JScript, you name it.
>>> Solid languages, some of them supporting Unicode out of the box, way
>>> more popular. You stop wasting your time developing a compiler,
>>> and people will not be forces to waste time learning another
>>> programming language (C-like but not quite C).
>> ****
>> But that sounds *reasonable*.
>>
>> Note that "permits people to write GUI scripts in their native language" but "all
>> characters above the ASCII range" [which I presume means U007F] "are letters". Apparently,
>> these languages do not have localized punctuation marks or digits, which is true only if
>> you live deep in a Reality Distortion Field.
>
>Anyone ever tell you that you are way too nit picky, paying very deep
>attention to points that make no difference at all?
****
Probably not. I'm American, English is my native language, and I work with largely the
characters in the range U0020-U007E. But people who live in other countries, who read
that you have something that allows them to script "in their native language", might well
expect to use the digit glyphs they are most familiar with, and the punctuation marks they
are most familiar with. So you cannot represent that you allow them to program in their
native language if you don't allow them to program in their native language.
****
>
>It is only a Letter in the sense that it is a valid character for an
>Identifier. The original Lex specification for L was [A-Za-z_], and
>underscore is not really a "Letter" either. For the purpose of valid "C"
>language identifiers, an "_" underscore is a letter.
****
But that's my point. Even a casual reading of the Unicode spec tells you that there are
digits and punctuation marks in various code point ranges. So I find it hard to believe
that any code point other than a few limited ones in the range U0020-U007E are the only
code points that can be called "delimiters" or "digits".

Don't accuse ME of being picky; YOU'RE the one who established the goal of "in their
native language" then you violate your own requirement!
****
>
>The "C" A REFERENCE MANUAL (FOURTH EDITION) bothers to make the
>distinction between and "_" underscore and a Letter. Lex does not need
>to know this distinction. As far as the "C" language us concerned an "_"
>underscore is treated the same as a Letter, so no distinction need be made.
****
i gave a long list of non-letter ranges in an earlier post. How is it that you can claim
that a thai digit or code U055D (Armenian comma) is a "letter"? And still say you are
meeting your requirement of allowing people to code in their native language? Seriously,
explain how this is possible?
****
>
>I could have called the range of code points above 7F to be named
>something other than a letter but there was no need to since its makes
>no relevant difference.
****
So if I'm writing in some other language, and write two parameters, in my native language
using the letters for A and B, and my native comma, you are saying that A,B is actually a
single identifier? Really? Am I going to believe you are supporting me coding in my
native language? And this is just looking at the most trivial examples; I don't know
enough of Chinese, Japanese or Korean to tell how a "name" which is a sequence of letters
can be formed. I could believe that a single Chinese character would constitute a valid
variable name, so two such glyphs would be the equivalent of writing, in C, the expression
A B
which is syntactically invalid.

I do believe you are required to actually implement what you state you are doing.

We actually had to worry about this when we were looking at creating compilers for the
European market. The fact that our development team had a Norwegian on it who resented
the limitation to the 7-bit alphabet was a factor.
****
>
>>
>> In what language, exactly, is my use of the localized punctuation marks or digits
>> considered part of the set of "letters". Presumably, if this were cast into the context
>> of the 7-bit set, it would mean that I could have identifies "A,B", "A.B", "A;B" "01ABC",
>> "3CAT" and so on. If my native language has a native comma, period, or semicolon, why is
>> this considered a "letter"? Why is it I can start an identifier with a digit? Why is my
>> native rendering of 12.34 considered an "identifier" and not a "number"? And localized
>> digits? If I were doing this, I'd have productions that defined numeric sequences, e.g.,
>> bengali_number, thai_number, etc. and then have a production that a number is an
>> "ascii_number", "bengali_number", "thai_number", etc., but unfortunately that would merely
>> make my implementation *correct*, rather than "small and fast" (this is the Unix mindset:
>> it doesn't matter if it is right as long as it is small and fast).
****
I note you still avoid explaining how the use of localized punctuation and digits will be
supported.
joe
****
>>
>> Of course, if you want to simplify the problem to make its implementation easy, and
>> violate your own specification of using native languages, then making such nonsensical
>> statements such as "all characters above the ASCII range are letters" is acceptable.
>> Nonsesical, of course, but if you define nonsense away by saying "my implemention defines
>> correctness, not my specification", then it is presumably OK.
>> joe
>> Joseph M. Newcomer [MVP]
>> email: newcomer(a)flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 18 May 2010 18:47

That is a bit over the top. Admittedly, Hector's frustration overflows more readily than
mine; I merely insist that you be internally consistent with your claims and
implementation. You have made a claim about allowing people to program in their native
languages, then make sure it is not possible by failing to actually support this, and then
claim that anyone who questions your failure to live up to your self-proclaimed
specification is overly nit-picky. That's a bit unprofessional, too, so ideas of pots and
kettles passes through my mind.

Realize that if you say

A will be true

and then say

My implementation guarantees A is false

and then claim

My implementation supports the specification that A is true

it is a bit hard for us to credit that you have a clue as to what you are talking about.
joe

On Tue, 18 May 2010 16:40:35 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote:

>On 5/18/2010 4:34 PM, Hector Santos wrote:
>> Peter Olcott wrote:
>>
>>>
>>> Did you say something goofus?
>>
>> Yeah chump! I rather be a goofus than a moronic, ugly, stupid seriously
>> sick and pathetic a-hole like you.
>>
>> Thanks for repeating the above so it gets more distribution.
>>
>> --
>> HLS
>
>No one is more unprofessional than Hector Santos.
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 18 May 2010 18:48

See below...
On Tue, 18 May 2010 14:03:56 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote:

>On 5/18/2010 3:10 AM, Mihai N. wrote:
>>
>>
>> Why not go to the root of the problem?
>>
>> This is what you need:
>> > For the purpose of creating an interpreted GUI scripting language that
>> > permits people to write GUI scripts in their native language
>>
>> Then expose the whole thing using a COM model, and it would allow
>> anyone to automate using any .NET language, Perl, JScript, you name it.
>> Solid languages, some of them supporting Unicode out of the box, way
>> more popular. You stop wasting your time developing a compiler,
>> and people will not be forces to waste time learning another
>> programming language (C-like but not quite C).
>>
>>
>>
>>
>I considered that , but rejected it for two reasons:
>(1) Not sufficiently platform independent.
>(2) Makes my success too dependent upon Microsoft.
****
Why do you program in Windows? That really requires (2), since the linux/Mac market is
far too small to count.
joe
****
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: Love Potion for Miss Blandish
Next: Newcomer's CAsyncSocket example: trouble connecting with other clients