anomalies in capitalization in String functions [Lisp]

Prev: tiny fix to asdf
Next: External command's output as stream

From: Tamas K Papp on 3 Mar 2010 01:48

On Tue, 02 Mar 2010 17:55:00 -0500, Barry Margolin wrote:

> In article <hmjmj6$uf3$1(a)news.eternal-september.org>,
> Tim Bradshaw <tfb(a)tfeb.org> wrote:
>
>> On 2010-03-02 17:01:16 +0000, Jerry Boetje said:
>>
>> > I'm
>> > pointing out that perhaps we should move CL into the current world -
>> > which means Unicode.
>>
>> However, as several people have pointed out to you *changing the way
>> STRING-CAPITALIZE works by default is an INCOMPATIBLE CHANGE*, and, you
>> know what, we don't like those, because they break our code. If you
>> want this, define a standard suite of functions which are not in the CL
>> package and which have the behavour you want. How hard can this be to
>> understand?
>
> What's the chance that any application is actually depending on the
> current behavior? More likely, applications that might feed a word like
> "don't" to a capitalization function avoid using the built-in function
> because it doesn't work correctly. So actual use is mostly limited to
> cases where a change in the definition would be compatible, and the
> change would then make the function usable in a wider variety of cases.
>
> But since there's no current plans to revise CL, what's the point of
> this discussion? Even if we all agree on what STRING-CAPITALIZE
> *should* do, it's not going to change anything.

Even if revising CL was a feasible possibility in the short run,
having a well-established library that demonstrates how the new
approach would work would be a prerequisite. Changing the standard
without proper exploration would be silly.

So instead of complaining about the standard, the OP should just write
that library. If there is a revision of CL in the (far) future, it
might be incorporated if it is mature enough. If not, he will still
have the library and can use it :-)

Tamas

From: =?iso-8859-1?Q?Bj=F6rn?= Lindberg on 3 Mar 2010 03:52

Jerry Boetje <jerryboetje(a)mac.com> writes:

> [...] "don't" => "Don'T". Any 4th-grader would know that the right
> capitalization is "Don't".

> The world is NOT just english [...]

So why do you make your motivating example English-centric, then?

It is in general not possible to do string manipulations like
capitalization such that it is semantically correct without knowing the
language the string is in. Strings do not hold information on which
language they are in, and string-capitalize does not take a language
argument. It is therefore not the function you are looking for, yet has
its place.

Bj�rn Lindberg

From: Piotr Chamera on 3 Mar 2010 04:22

Jerry Boetje pisze:
> The spec for STRING-CAPITALIZE is defined to break into words where:
> "a ``word'' is defined to be a consecutive subsequence consisting of
> alphanumeric characters". This gives interesting results such as
> "don't" => "Don'T". Any 4th-grader would know that the right
> capitalization is "Don't". In CLforJava, we use the Unicode
> definitions for breaking, and we get "Don't". Any thoughts about
> changing this weirdness? Please, no "but, but it's the specification"
> comments. I get the spec. This gets more into a transition from the
> 1980's definition of characters and strings and into the Unicode
> world. I'd rather talk about the world of today and what we can do
> about it.

I think that unicode is to big (and changing) issue to define it with
language standard. What CL needs is standard library like IBM ICU is
for C, C++ or Java (http://site.icu-project.org/). No one (modern or
not) general purpose language has all unicode in standard, they have
basic necessary structures (characters), and nothing in CL standard
breaks here, but all other unicode, language and locale dependent
features belong to libraries.

From: Tim Bradshaw on 3 Mar 2010 13:33

On 2010-03-02 22:55:00 +0000, Barry Margolin said:

> What's the chance that any application is actually depending on the
> current behavior? More likely, applications that might feed a word like
> "don't" to a capitalization function avoid using the built-in function
> because it doesn't work correctly. So actual use is mostly limited to
> cases where a change in the definition would be compatible, and the
> change would then make the function usable in a wider variety of cases.

I don't know, and I don't want to find out the hard way. What I'm
trying to suggest is that changes should be made in an
upward-compatoble manner. Leave things so that CL:STRING-CAPITALIZE
does what it does now, but have some new version which behaves more
flexibly. So old programs continue to work, while changes can happen
to allow new behaviour.

One could do this either by defining new packages
(ORG.LISP.<whatever-the-standard-is-to-be-called>.CL or something) or
by sorting out some kind of Genera-style package-universe system.

>
> But since there's no current plans to revise CL, what's the point of
> this discussion? Even if we all agree on what STRING-CAPITALIZE
> *should* do, it's not going to change anything.

By leaving the symbols exported from CL with their current semantics
one should be able to allow much lighter-weight experimentation and
change.

From: Tim Bradshaw on 3 Mar 2010 13:51

On 2010-03-02 19:24:13 +0000, Jerry Boetje said:

> Lets see... How many major revisions of FORTRAN and COBOL - 2 of the
> original 3 survivors - have there been that are incompatible in at
> least small ways - revisions necessitated by the environment.

Well, I used to write FORTRAN. I wrote pretty much in F77 because that
was what was current then, but all the compilers supported modes which
supported both FORTAN IV and FORTAN 66, because we had a lot of
libraries that were in those older versions and which no one had any
intention of changing. And of course it worked to link those to your
F77 code.

> Here we are over 20 years and we are
> clinging to a document that hasn't changed since its inception. As to
> making yet another module, hey - characters and strings are pretty
> basic. And they changed in the world but not in the specification.
> Yes, people make incompatible changes because the environment made
> incompatible changes. The world is NOT just english (actually American
> - it changed that much) any more.

I am not proposing clinging to any document: I want change, and indeed
I've spent a bunch of time producing infrastrucuture that allows me to
make this kind of change. I am sggesting a mechanism of supporting
change which allows old programs to run unaffected, by saying that
symbols exported from CL behave as specified by the old standard, and
changes should be made in new packages, or in some equivalent way to
that.

I am also, of course, not arguing that strings should change (or not):
this is about what *functions that operated on strings* should do.

This kind of multiple-behavours-in-one-image was supported a very long
time ago now - Genera had package universes which allowed the same
image to support, I think, three dialects of Lisp (ZetaLisp, CLtL1 CL
and ANSI CL). But perhaps something like this is just too hard for
people to manage now, I don't know.

First | Prev |
Pages: 1 2 3 4 5
Prev: tiny fix to asdf
Next: External command's output as stream