From: James Kanze on 14 Dec 2006 17:45 Niklas Matthies wrote: > On 2006-12-13 23:48, Andrei Alexandrescu (See Website For Email) wrote: > > Niklas Matthies wrote: > >> Well, it depends what one considers "basic". It's possible in Java to > >> have the statement > >> System.out.println("Hello, world!"); > >> output "Suprise!" (or any other arbitrary string), by appropriate > >> preceding code. [...] > > I didn't know that! How is it possible? > Because string objects initialized from string literals are just > regular instances of the java.lang.String class, which is implemented > in plain Java (with the exception of its intern() method). > > Got code? > Here you go: You missed the best part: > class Test > { > public static void main(String[] args) throws Exception > { > java.util.HashSet set = new java.util.HashSet(); > set.add("Hello, World!"); > > doEvil(); > > set.add("Hello, World!"); > System.out.println(set); // prints "[Surprise!, Surprise!]" System.out.println( "Hello, World!" ); // Also prints "Surprise!" > } [...] It's nice to know that string literals aren't constants. (Sort of reminds me of Fortran IV, where constants passed to a function could be modified by the function, so a different constant would be passed the next time. If you look at Niklas' code, you'll also see how you can get things like: String s = "Hello, World!" ; s.lastIndexOf( 'H' ) throwing an ArrayIndexOutOfBoundsException. Of course, this was also the case in the original C. Maybe Java got its ideas about how a string literal should behave from there. Thank goodness we've made some progress in this respect in C++ (and in C90---even the C standards committee thought that modifying constants was taking empowerment of the programmer a bit too far). -- James Kanze (GABI Software) email:james.kanze(a)gmail.com Conseils en informatique orient�e objet/ Beratung in objektorientierter Datenverarbeitung 9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34 -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Al on 15 Dec 2006 00:05 James Kanze wrote: <snip> > > It's nice to know that string literals aren't constants. (Sort > of reminds me of Fortran IV, where constants passed to a > function could be modified by the function, so a different > constant would be passed the next time. If you look at Niklas' > code, you'll also see how you can get things like: > String s = "Hello, World!" ; > s.lastIndexOf( 'H' ) > throwing an ArrayIndexOutOfBoundsException. > > Of course, this was also the case in the original C. Maybe Java > got its ideas about how a string literal should behave from > there. Thank goodness we've made some progress in this respect > in C++ (and in C90---even the C standards committee thought that > modifying constants was taking empowerment of the programmer a > bit too far). Well, there are two issues, which are distinct: A) (String) Literals being unique (single instance). B) (String) Literals being constant (immutable). If I understand correctly, A is done to minimize redundant memory consumption. I agree that /if/ A is true (in any given language), then B /should/ be true. However, if A is false, then B is not necessary. In my opinion, A is Premature Optimization� that puts unfortunate constraints on the language. How many identical string literals does a program have, on average? I would say very few, if the code is well-written. If the program is dynamically localizable (as is often the case), probably /none/. Furthermore, if I understand correctly: In C++, A is true* and B is true**. In Java, A is true*** and B true****. * Or at least, probably, since the compiler will likely optimize it. ** Except char pointers decay to non-const. *** At least those created at compile-time. **** Except that reflection can be used to bypass it. So I would conclude that ideally, a modern language should make string literals: A) Per-instance (or CoW). B) Mutable. If this is not possible, then at least: A) Unique. B) Const. The worst possible case is: A) Unique. B) Mutable. Depending on how you interpret the caveats, I would argue that both Java /and/ C++ are in the third category, which is not good. Cheers, -Al. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: peter koch larsen on 15 Dec 2006 08:28 Andrei Alexandrescu (See Website For Email) skrev: > Niklas Matthies wrote: > > Well, it depends what one considers "basic". It's possible in Java to > > have the statement > > > > System.out.println("Hello, world!"); > > > > output "Suprise!" (or any other arbitrary string), by appropriate > > preceding code. [snip] > I didn't know that! How is it possible? Got code? Heck, it's not > possible in many C and C++ implementations - they put constant strings > in read-only pages. cheers! Andrei, I accidently fell over an article called something like "hi there".equals("cheers !") == true and skimming the article shows that this is exactly the article you requested. It is referenced at Kevlin Heeney'(?)s web (curbralan?), and I believe it was an article from Artima. Anyway, a quick google should get you home no sweat. hi there Peter -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: James Kanze on 15 Dec 2006 08:23 Al wrote: > James Kanze wrote: > <snip> > > It's nice to know that string literals aren't constants. (Sort > > of reminds me of Fortran IV, where constants passed to a > > function could be modified by the function, so a different > > constant would be passed the next time. If you look at Niklas' > > code, you'll also see how you can get things like: > > String s = "Hello, World!" ; > > s.lastIndexOf( 'H' ) > > throwing an ArrayIndexOutOfBoundsException. > > Of course, this was also the case in the original C. Maybe Java > > got its ideas about how a string literal should behave from > > there. Thank goodness we've made some progress in this respect > > in C++ (and in C90---even the C standards committee thought that > > modifying constants was taking empowerment of the programmer a > > bit too far). > Well, there are two issues, which are distinct: > A) (String) Literals being unique (single instance). > B) (String) Literals being constant (immutable). Formally, yes. Practically, strings are values, so identity isn't important, which means that if the strings are constant, whether identical strings are a single instance or not is irrelevant. (There are exceptions to this, of course. When optimizing, it is sometimes useful to require a single instance for all identical strings, in order to just compare pointers, rather than comparing all of the characters.) > If I understand correctly, A is done to minimize redundant memory > consumption. Not only. Depending on how and where it is done, it can be used to reduce total memory consumation, reduce dynamic allocation (which can be expensive in terms of run-time) or to simplify comparisons---if you know that two strings with the same value must be at the same address, you can just compare pointers. > I agree that /if/ A is true (in any given language), then B > /should/ be true. Per definition, B should be true. A literal is a compile time constant. The only exceptions I'm aware of were early versions of Fortran and C---and now Java. Both Fortran and C corrected this defect very early in their existance. Java seems to have added it; it wasn't present in the earliest implementations (which didn't have reflection). > However, if A is false, then B is not necessary. I disagree. If I see a numeric constant 42 in the source code, I should be able to count on its value being 42. And if I see a string literal "abc", I should be able to count on its value being "abc". Constants should not be variables, and vice versa. > In my opinion, A is > Premature Optimization? that puts unfortunate constraints on the > language. It has nothing to do with optimization. It's a question of readability. How would you like it if the expression "i += 1" added 2 to i? And how is that any different from the expression `System.println( "Hello" )' printing "Good bye"? > How many identical string literals does a program have, on > average? I would say very few, if the code is well-written. If > the program is dynamically localizable (as is often the case), > probably /none/. I don't know. "WHERE" tends to occur a lot in SQL requests (with what precedes and follows variable). And I would strongly recommend NOT replacing "WHERE" with "O�" or "WO", just because you are in a French or German locale. An HTML client will doubtlessly want to use "GET" (but that use is more likely to be localized in one place in the program). And the logging macros are full of __FILE__, which expands to the same string literal throughout the file. Not that that's relevant to anything. (Except maybe the expansion of __FILE__, which could increase the size of the executable noticeably if the identical instances aren't merged.) > Furthermore, if I understand correctly: > In C++, A is true* and B is true**. > * Or at least, probably, since the compiler will likely optimize it. > ** Except char pointers decay to non-const. A is unspecified. B is formally true, in that any attempt to modify a string literal is undefined behavior. Because early C guaranteed that string literals could be modified, and that each instance was a separate object, many C++ compilers still support this (often only with certain compiler options). Note that the fact that the pointer can be implicitly converted to non-const, at least in some very frequent cases, does not authorize modification. It's an intentional hack to support previously existing practice. > In Java, A is true*** and B true****. > *** At least those created at compile-time. > **** Except that reflection can be used to bypass it. If it isn't created at compile-time, it isn't a string literal, either in Java or C++. And if there's anything in the language which allows you to modify a literal, that's a serious defect. In the case of Java, the problem concerning literals may be the most shocking, externally, but the fact that you can modify a String after having passed it to another subsystem is far more serious, since it undermines many of Java's security measures. > So I would conclude that ideally, a modern language should make string > literals: > A) Per-instance (or CoW). > B) Mutable. A literal should never be mutable. Modifying a literal is on the same level as other self-modifying code. > If this is not possible, then at least: > A) Unique. > B) Const. > The worst possible case is: > A) Unique. > B) Mutable. > Depending on how you interpret the caveats, I would argue that > both Java /and/ C++ are in the third category, which is not > good. The modification of literals is a fun exercise, to demonstrate the problem. (G++ puts string literals in write protected memory, so they can't be modified. Period. Sun CC will do so to, with the right options.) But it's only one aspect of the problem; the real problem is modifying something that the author of the code thinks cannot be modified. In C++, this is most often a result of unintentional aliasing---just because you have a std::string const& doesn't mean that the string value will not change. In C++, however, this is so frequently a problem that it is pretty well understood; most C++ programmers know that if you need to be sure that something doesn't change, you make a deep copy of it---you use pass by value. Java has similar problems, in that you don't always know when objects are shared, and when they aren't. This is normally only a problem with objects which have value semantics---if identity is relevant to the object's semantics, then obviously, you know which objects are shared, and which aren't, by design. The normal solution to this is to make value objects immutable. (For a good example of what happens when you don't, consider the return value of javax.swing.getPreferredSize(), which returns a mutable value object. What happens if you modify it? Depending on the code you've previously executed, and the layout manager installed, you may or may not modify the preferred size of the component; it's anybody's guess.) And of course, the problem here is that we have a means of modifying an object which has been carefully designed to be immutable, and which must be immutable, for security reasons. In practice, you can probably force uniqueness by something like: StringBuffer tmp( " " ) ; tmp.append( s ) ; s = tmp.substring( 1 ) ; but 1) I don't think it's formally guaranteed, and 2) I've never seen the necessity of this sort of hack documented. And I repeat, the possibility of modifying a string *after* having passed it to a library function is a serious security hole. I'm very surprised that Java let's this one through. -- James Kanze (GABI Software) email:james.kanze(a)gmail.com Conseils en informatique orient�e objet/ Beratung in objektorientierter Datenverarbeitung 9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34 -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Niklas Matthies on 15 Dec 2006 16:17
On 2006-12-15 13:23, James Kanze wrote: : > In the case of Java, the problem concerning literals may be the > most shocking, externally, but the fact that you can modify a > String after having passed it to another subsystem is far more > serious, since it undermines many of Java's security measures. No, it doesn't, because a security-conscious application will run under a SecurityManager that will prevent such accesses (the setAccessible() call will fail). The motivation for enabling such accesses is of course for use by a debugger without causing the debugging-enabled Java implementation to become non-conformant. -- Niklas Matthies -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |