From: Sam on 13 May 2010 21:01 Peter Olcott writes: > On 5/13/2010 6:40 PM, Sam wrote: >>> This time I found the original source of a semantically identical >>> regular expression that you berated so rudely. >>> http://www.w3.org/2005/03/23-lex-U >>> >>> Who knows, maybe www.w3.org is wrong and you are right? >> >> And as I wrote in the first thread, I suspected that the regular >> expression mish-mash's actual purpose was to validate some defined a >> subset of the entire Unicode range, as encoded in UTF-8. > > And this view is clearly incorrect. It validates the the entire set of > UTF-8 encodings. Here is a quote: > > "This pattern does not restrict to the set of > defined UCS characters, instead to the set that > is permitted by UTF-8 encoding." > > The difference is the missing D800-DFFF High and Low surrogates that are > not legal in UTF-8. All of the other CodePoints from 0-10FFFF are > represented. Since you claim to know so much about UTF-8 encoding and decoding -- even more than RFC 2279 -- it's a wonder you had to ask your question at all. It seems that you already knew the answer to the question. Good luck UTF-8 encoding and decoding.
From: Peter Olcott on 13 May 2010 21:24 On 5/13/2010 8:01 PM, Sam wrote: > Peter Olcott writes: > >> On 5/13/2010 6:40 PM, Sam wrote: >>>> This time I found the original source of a semantically identical >>>> regular expression that you berated so rudely. >>>> http://www.w3.org/2005/03/23-lex-U >>>> >>>> Who knows, maybe www.w3.org is wrong and you are right? >>> >>> And as I wrote in the first thread, I suspected that the regular >>> expression mish-mash's actual purpose was to validate some defined a >>> subset of the entire Unicode range, as encoded in UTF-8. >> >> And this view is clearly incorrect. It validates the the entire set of >> UTF-8 encodings. Here is a quote: >> >> "This pattern does not restrict to the set of >> defined UCS characters, instead to the set that >> is permitted by UTF-8 encoding." >> >> The difference is the missing D800-DFFF High and Low surrogates that >> are not legal in UTF-8. All of the other CodePoints from 0-10FFFF are >> represented. > > Since you claim to know so much about UTF-8 encoding and decoding -- > even more than RFC 2279 -- it's a wonder you had to ask your question at > all. It seems that you already knew the answer to the question. http://tools.ietf.org/html/rfc3629 This memo obsoletes and replaces RFC 2279. > > Good luck UTF-8 encoding and decoding. > Thanks.
From: Jasen Betts on 14 May 2010 05:01 On 2010-05-13, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote: > > "Ian Collins" <ian-news(a)hotmail.com> wrote in message > news:8539h9F7f1U1(a)mid.individual.net... >> On 05/14/10 08:06 AM, Peter Olcott wrote: >>> Is this Regular Expression for UTF-8 Correct?? >> >> It's a fair bet you are off-topic in all the groups you >> have cross posted to. Why don't you pick a group for a >> language with built in UTF8 and regexp support (PHP?) and >> badger them? >> >> -- >> Ian Collins > > What does this question have to do with the C++ language? > > At least my question is indirectly related to C++ by making > a utf8string for the C++ language from the regular > expression. Just use iconv. and don't cross post off-topic. --- news://freenews.netfront.net/ - complaints: news(a)netfront.net ---
From: David Schwartz on 17 May 2010 05:17 On May 13, 3:04 pm, "Peter Olcott" <NoS...(a)OCR4Screen.com> wrote: > What does this question have to do with the C++ language? > > At least my question is indirectly related to C++ by making > a utf8string for the C++ language from the regular > expression. > > Your question is not even indirectly related to the C++ > language. Unfortunately, no better way is known to keep conversations on topic. If you know a better way, we'd all love to hear it. If you don't respond immediately in the forum and point out that something is off topic, other people browsing the forum will think the question was on topic. Other ways have been tried in the past (such as private mails where possible, monthly posts about topicality rather than replying to each off-topic post, and so on). None have been shown to be effective. Painful experience has shown that the most effective technique is to verbally berate and ridicule people who post off topic. Thus others will see the negative response by the group and now want their posts to be met with a similar response. Again, this wasn't anyone's first choice, and if you know a better way, please tell us. (In the appropriate forum, of course!) DS
First
|
Prev
|
Pages: 1 2 3 4 Prev: Raw socket link indication Next: diffent results of make implicit rules |