From: Tamas K Papp on 2 Mar 2010 12:38 On Tue, 02 Mar 2010 09:22:32 -0800, Kyle M wrote: > On Mar 2, 11:05 am, Pascal Costanza <p...(a)p-cos.net> wrote: >> On 02/03/2010 16:27, Jerry Boetje wrote: >> >> > The spec for STRING-CAPITALIZE is defined to break into words where: >> > "a ``word'' is defined to be a consecutive subsequence consisting of >> > alphanumeric characters". This gives interesting results such as >> > "don't" => "Don'T". Any 4th-grader would know that the right >> > capitalization is "Don't". In CLforJava, we use the Unicode >> > definitions for breaking, and we get "Don't". Any thoughts about >> > changing this weirdness? Please, no "but, but it's the specification" >> > comments. I get the spec. This gets more into a transition from the >> > 1980's definition of characters and strings and into the Unicode >> > world. I'd rather talk about the world of today and what we can do >> > about it. >> >> Even in the world of today, not everybody speaks only English. >> >> Pascal >> >> -- >> My website:http://p-cos.net >> Common Lisp Document Repository:http://cdr.eurolisp.org Closer to MOP & >> ContextL:http://common-lisp.net/project/closer/ > > Good point. That makes me curious though: in what languages is the > spec's rule correct? It is correct in Hungarian (my native language), but I doubt that the authors of the spec had that application in mind :-) > Even in English, you can't fix it for most proper nouns. My last name, > for example. > > This function just stinks. Someone should just deprecate it, forget > about it, take it out back and shoot it. Or (and I realize that this might sound crazy) you could just write another function which does what you want and forget about the whole issue. These kind of string manipulations are hairy. I am not surprised that the HS didn't try to get it right, I am surprised that it ended up in the HS _at all_. Tamas
From: Tim Bradshaw on 2 Mar 2010 13:49 On 2010-03-02 17:01:16 +0000, Jerry Boetje said: > I'm > pointing out that perhaps we should move CL into the current world - > which means Unicode. However, as several people have pointed out to you *changing the way STRING-CAPITALIZE works by default is an INCOMPATIBLE CHANGE*, and, you know what, we don't like those, because they break our code. If you want this, define a standard suite of functions which are not in the CL package and which have the behavour you want. How hard can this be to understand?
From: Thomas A. Russ on 2 Mar 2010 12:36 Jerry Boetje <jerryboetje(a)mac.com> writes: > ... leaving CL in the dust of ASCII does it a > disservice. Actually, CL doesn't require ASCII at all. In fact, that there are parts of the specification that specifically allow for other character encodings. (At the time EBCDIC was a prime competitor to ASCII). So, there is no specification of exactly what CHAR-CODE produces, and it is explicitly allowed for the character codes of letters to be non-contiguous. So, in principle, there is nothing that prevents a conforming Common Lisp implementation from adopting Unicode as the character encoding system. In fact, there are several implementations that offer unicode strings. Now this still doesn't address the specific rule about the behavior of STRING-CAPITALIZE, which would still remain. It also doesn't address the issue that in Unicode there are some differences between upper case characters and capitalized characters. (IIRC there are 3 strange cases where they are different.) So, you would still want to have some unicode-specific functions. But it seems to me (without deep study) that a unicode-based lisp would be conforming and also a good idea. -- Thomas A. Russ, USC/Information Sciences Institute
From: Jerry Boetje on 2 Mar 2010 14:24 On Mar 2, 1:49 pm, Tim Bradshaw <t...(a)tfeb.org> wrote: > On 2010-03-02 17:01:16 +0000, Jerry Boetje said: > > > I'm > > pointing out that perhaps we should move CL into the current world - > > which means Unicode. > > However, as several people have pointed out to you *changing the way > STRING-CAPITALIZE works by default is an INCOMPATIBLE CHANGE*, and, you > know what, we don't like those, because they break our code. If you > want this, define a standard suite of functions which are not in the CL > package and which have the behavour you want. How hard can this be to > understand? Lets see... How many major revisions of FORTRAN and COBOL - 2 of the original 3 survivors - have there been that are incompatible in at least small ways - revisions necessitated by the environment. Those constituents had to make changes to go forward because the world changed - so they changed. Here we are over 20 years and we are clinging to a document that hasn't changed since its inception. As to making yet another module, hey - characters and strings are pretty basic. And they changed in the world but not in the specification. Yes, people make incompatible changes because the environment made incompatible changes. The world is NOT just english (actually American - it changed that much) any more. Just to ratchet the heat, another major change should be in the File system. Gee, we have networks now. Yes, there packages that handle networks. But if the file system were to be changed to deal with URIs, a lot of things would be much easier. Every knows how to (write ...mumble.. a-URI-stream) if they thought about it. We tried it, and it works. It would be easier if we could make a few changes in the definition. Look, I love this language as well as anyone on this board. I've sent out hundreds of students knowing CL (and how to build one). There are many, many fine packages that build on the strength of CL. But some things are just a waste of time - like making a Unicode package rather than making a few basic changes to the spec.
From: Pascal J. Bourguignon on 2 Mar 2010 15:44 Kyle M <kylemcg(a)gmail.com> writes: > On Mar 2, 11:05�am, Pascal Costanza <p...(a)p-cos.net> wrote: >> On 02/03/2010 16:27, Jerry Boetje wrote: >> >> > The spec for STRING-CAPITALIZE is defined to break into words where: >> > "a ``word'' is defined to be a consecutive subsequence consisting of >> > alphanumeric characters". This gives interesting results such as >> > "don't" => �"Don'T". Any 4th-grader would know that the right >> > capitalization is "Don't". In CLforJava, we use the Unicode >> > definitions for breaking, and we get "Don't". Any thoughts about >> > changing this weirdness? Please, no "but, but it's the specification" >> > comments. I get the spec. This gets more into a transition from the >> > 1980's definition of characters and strings and into the Unicode >> > world. I'd rather talk about the world of today and what we can do >> > about it. >> >> Even in the world of today, not everybody speaks only English. > > Good point. That makes me curious though: in what languages is the > spec's rule correct? Possibly in none, and that's the point! STRING-CAPITALIZE is not ENGLISH-TEXT-CAPITALIZE or ARABIC-TEXT-CAPITALIZE, and it is definitely NOT UNICODE-STRING-CAPITALIZE. > It's broken in English. In Arabic translations we use "Qu'ran" and not > "Qu'Ran", so it's broken in Arabic. I don't know much French, but I > thought "coup d'etat" became "Coup d'Etat", so isn't it wrong in > French too (albeit for different reasons)? > > Even in English, you can't fix it for most proper nouns. My last name, > for example. > > This function just stinks. Someone should just deprecate it, forget > about it, take it out back and shoot it. It's a technical specification, on which programs can rely to process strings in a deterministic way. It is up to libraries or applications to implement natural language processing, including grammatically and typographically correct capitalizations for the various natural languages. But it is not acceptable to change the meaning of a low level function such as STRING-CAPITALIZE, this could break a lot of programs (protocols, file formats, "smart" algorithms, whatever. -- __Pascal Bourguignon__ http://www.informatimago.com
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: tiny fix to asdf Next: External command's output as stream |