From: Tamas K Papp on 3 Mar 2010 01:48 On Tue, 02 Mar 2010 17:55:00 -0500, Barry Margolin wrote: > In article <hmjmj6$uf3$1(a)news.eternal-september.org>, > Tim Bradshaw <tfb(a)tfeb.org> wrote: > >> On 2010-03-02 17:01:16 +0000, Jerry Boetje said: >> >> > I'm >> > pointing out that perhaps we should move CL into the current world - >> > which means Unicode. >> >> However, as several people have pointed out to you *changing the way >> STRING-CAPITALIZE works by default is an INCOMPATIBLE CHANGE*, and, you >> know what, we don't like those, because they break our code. If you >> want this, define a standard suite of functions which are not in the CL >> package and which have the behavour you want. How hard can this be to >> understand? > > What's the chance that any application is actually depending on the > current behavior? More likely, applications that might feed a word like > "don't" to a capitalization function avoid using the built-in function > because it doesn't work correctly. So actual use is mostly limited to > cases where a change in the definition would be compatible, and the > change would then make the function usable in a wider variety of cases. > > But since there's no current plans to revise CL, what's the point of > this discussion? Even if we all agree on what STRING-CAPITALIZE > *should* do, it's not going to change anything. Even if revising CL was a feasible possibility in the short run, having a well-established library that demonstrates how the new approach would work would be a prerequisite. Changing the standard without proper exploration would be silly. So instead of complaining about the standard, the OP should just write that library. If there is a revision of CL in the (far) future, it might be incorporated if it is mature enough. If not, he will still have the library and can use it :-) Tamas
From: =?iso-8859-1?Q?Bj=F6rn?= Lindberg on 3 Mar 2010 03:52 Jerry Boetje <jerryboetje(a)mac.com> writes: > [...] "don't" => "Don'T". Any 4th-grader would know that the right > capitalization is "Don't". > The world is NOT just english [...] So why do you make your motivating example English-centric, then? It is in general not possible to do string manipulations like capitalization such that it is semantically correct without knowing the language the string is in. Strings do not hold information on which language they are in, and string-capitalize does not take a language argument. It is therefore not the function you are looking for, yet has its place. Bj�rn Lindberg
From: Piotr Chamera on 3 Mar 2010 04:22 Jerry Boetje pisze: > The spec for STRING-CAPITALIZE is defined to break into words where: > "a ``word'' is defined to be a consecutive subsequence consisting of > alphanumeric characters". This gives interesting results such as > "don't" => "Don'T". Any 4th-grader would know that the right > capitalization is "Don't". In CLforJava, we use the Unicode > definitions for breaking, and we get "Don't". Any thoughts about > changing this weirdness? Please, no "but, but it's the specification" > comments. I get the spec. This gets more into a transition from the > 1980's definition of characters and strings and into the Unicode > world. I'd rather talk about the world of today and what we can do > about it. I think that unicode is to big (and changing) issue to define it with language standard. What CL needs is standard library like IBM ICU is for C, C++ or Java (http://site.icu-project.org/). No one (modern or not) general purpose language has all unicode in standard, they have basic necessary structures (characters), and nothing in CL standard breaks here, but all other unicode, language and locale dependent features belong to libraries.
From: Tim Bradshaw on 3 Mar 2010 13:33 On 2010-03-02 22:55:00 +0000, Barry Margolin said: > What's the chance that any application is actually depending on the > current behavior? More likely, applications that might feed a word like > "don't" to a capitalization function avoid using the built-in function > because it doesn't work correctly. So actual use is mostly limited to > cases where a change in the definition would be compatible, and the > change would then make the function usable in a wider variety of cases. I don't know, and I don't want to find out the hard way. What I'm trying to suggest is that changes should be made in an upward-compatoble manner. Leave things so that CL:STRING-CAPITALIZE does what it does now, but have some new version which behaves more flexibly. So old programs continue to work, while changes can happen to allow new behaviour. One could do this either by defining new packages (ORG.LISP.<whatever-the-standard-is-to-be-called>.CL or something) or by sorting out some kind of Genera-style package-universe system. > > But since there's no current plans to revise CL, what's the point of > this discussion? Even if we all agree on what STRING-CAPITALIZE > *should* do, it's not going to change anything. By leaving the symbols exported from CL with their current semantics one should be able to allow much lighter-weight experimentation and change.
From: Tim Bradshaw on 3 Mar 2010 13:51
On 2010-03-02 19:24:13 +0000, Jerry Boetje said: > Lets see... How many major revisions of FORTRAN and COBOL - 2 of the > original 3 survivors - have there been that are incompatible in at > least small ways - revisions necessitated by the environment. Well, I used to write FORTRAN. I wrote pretty much in F77 because that was what was current then, but all the compilers supported modes which supported both FORTAN IV and FORTAN 66, because we had a lot of libraries that were in those older versions and which no one had any intention of changing. And of course it worked to link those to your F77 code. > Here we are over 20 years and we are > clinging to a document that hasn't changed since its inception. As to > making yet another module, hey - characters and strings are pretty > basic. And they changed in the world but not in the specification. > Yes, people make incompatible changes because the environment made > incompatible changes. The world is NOT just english (actually American > - it changed that much) any more. I am not proposing clinging to any document: I want change, and indeed I've spent a bunch of time producing infrastrucuture that allows me to make this kind of change. I am sggesting a mechanism of supporting change which allows old programs to run unaffected, by saying that symbols exported from CL behave as specified by the old standard, and changes should be made in new packages, or in some equivalent way to that. I am also, of course, not arguing that strings should change (or not): this is about what *functions that operated on strings* should do. This kind of multiple-behavours-in-one-image was supported a very long time ago now - Genera had package universes which allowed the same image to support, I think, three dialects of Lisp (ZetaLisp, CLtL1 CL and ANSI CL). But perhaps something like this is just too hard for people to manage now, I don't know. |