From: Tom Anderson on
On Sun, 18 Jul 2010, Lew wrote:

> Lew wrote:
>>> Yes. Don't omit what tom said about /happens-before/:
>>>
>>>> In fact, it's worse than that. Thread A could finish the method and
>>>> update both calculated and code, but because there is no
>>>> happens-before relationship between thread A and thread B, it's
>>>> possible that B could come along later, and see the updated
>>>> calculated but *not* the updated code. So even without an unlucky
>>>> timeslice end, there is no guarantee of safety here.
>
> Andreas Leitgeb wrote:
>>
>> There is still a misunderstanding - I'm just not sure if it's on my or
>> your side. Thread 1 assigns two plain word-sized fields: a and then b.
>> can Thread 2 happen to see b's new value, and (after that) a's old
>> value?
>
> Yes.
>
>> Tom's explanation was (as far as I understood it) based on the
>> code-sample where the flag was set before the code, and he rightly
>> pointed out that this gap may be even much longer than expected. Can
>> it reverse, as well?
>
> Yes.
>
> When thread A changes values for shared data 'a' and 'b' without
> establishment of /happens-before/, at some later time (even
> chronologically after both values were changed), a different thread B
> can examine those data and find any of four states: the old value for
> both 'a' and 'b', the old value for 'a' and new for 'b', the new value
> for 'a' and old for 'b', or the new value for both.

The situation that i imagine that could give rise to this is based on
caches. Imagine a two-processor system where the processors have
write-back caches and a miserable lack of cache coherency. Let's say that
cache lines are 16 bytes long, aligned on 16-byte boundaries. Let's say
there is an object containing two integer fields which starts at address
4: the two-word header occupies bytes 4-11, the first field is 12-15, and
the second is 16-20. The two words are thus on different cache lines. A
thread running on one processor updates both words of this object: these
updates go to the cache, but not yet to memory. Later, for some reason, it
evicts the cache line covering bytes 16-31, and writes it back to main
memory. The second processor now runs a thread which loads both fields,
neither being in its cache at that time; it will load the recently-written
version of the second line, but will get the version of the first line
that existed before the first processor did its write, which has not yet
been written back to memory.

I don't think there are any modern architectures where this can happen
(they all have cache coherency protocols to prevent it), but it's
admissible under the Java memory model.

tom

--
YOUR MIND IS A NIGHTMARE THAT HAS BEEN EATING YOU: NOW EAT YOUR MIND. --
Kathy Acker, Empire of the Senseless
From: Arved Sandstrom on
Andreas Leitgeb wrote:
> Lew <noone(a)lewscanon.com> wrote:
>> Andreas Leitgeb wrote:
>>> Thread 1 assigns two plain word-sized fields: a and then b.
>>> can Thread 2 happen to see b's new value, and (after that) a's old value?
>> Yes.
>
>>> Tom's explanation was (as far as I understood it) based on the code-sample
>>> where the flag was set before the code, and he rightly pointed out that this
>>> gap may be even much longer than expected. Can it reverse, as well?
>> Yes.
>
>> When thread A changes values for shared data 'a' and 'b' without establishment
>> of /happens-before/, at some later time (even chronologically after both
>> values were changed), a different thread B can examine those data and find any
>> of four states: the old value for both 'a' and 'b', the old value for 'a' and
>> new for 'b', the new value for 'a' and old for 'b', or the new value for both.
>
> Thanks for ultimate clarification!
>
>> Andreas Leitgeb wrote:
>>>>> Has anyone found e.g. an english [sic] dictionary-word with hashCode 0, yet?
>>>>> Or perhaps the name of some commonly used class in Java standard library
>>>>> or some other String likely occurring in innocent code?
>> Lew wrote:
>>>> ""
>> Andreas Leitgeb wrote:
>>> Damn, I shouldn't have trimmed the "non-trivial" that I had already written.
>>> At least, the hashCode for "" is quite efficiently obtained.
>> It's not really trivial. "" is an extremely common 'String' value and
>> frequently can be the key to a hashed structure, so its hash value is relevant.
>
> The empty String *is* trivial, and triviality doesn't imply uncommonness or
> irrelevance. The hashcode of "" doesn't take long to compute, but if e.g. a
> String like "java.lang.String" had an hashCode of 0, then the necessity to
> recalc it each time might be more noticeable/relevant than for "".
>
"Trivial" has a whole bunch of meanings. Not being formally trained in
science, math, or computer science, most developers I know use it to
mean "trifling", "of no importance". In the sense that you mean -
"simplest possible case" - an empty String is certainly trivial. :-)

AHS
--
The most remarkable thing about my mother is that for thirty years she
served the family nothing but leftovers. The original meal has never
been found.
-- Calvin Trillin
From: Mike Schilling on


"Peter Duniho" <NpOeStPeAdM(a)NnOwSlPiAnMk.com> wrote in message
news:5sidnabjJ7U0H97RnZ2dnUVZ_qCdnZ2d(a)posted.palinacquisition...
> Andreas Leitgeb wrote:
>> Patricia Shanahan <pats(a)acm.org> wrote:
>>> Andreas Leitgeb wrote:
>>>> Thread 1 assigns two plain word-sized fields: a and then b.
>>>> can Thread 2 happen to see b's new value, and (after that) a's old
>>>> value?
>>> Depends on the memory order rules in the hardware. I'm most familiar
>>> with the rules for SPARC v9, but other processors have their own rules.
>>> In Total Store Order they cannot reverse. In Relaxed Memory Order and
>>> Partial Store Order, they can reverse, from the point of view of other
>>> processors, unless some action such as a membar #StoreStore forces
>>> ordering.
>>> See
>>> http://en.wikipedia.org/wiki/Memory_ordering#In_SMP_microprocessor_systems
>>
>> Thanks for the processor-specific infos.
>> It means, that an attempt to "produce" such a reversion (in the sense as
>> one can "produce" a deadlock) may be doomed if on the "wrong" platform.
>
> Keep in mind that it's not just hardware implementation that affects this.
> Compiler optimizations can change the order of reads and writes as well.
>
> The bottom line is that if the order of reads and writes is important,
> there needs to be some kind of synchronization in the code to accomplish
> that. It may be sufficient to specify "volatile", or a more costly
> synchronization technique may be needed (such as acquiring the monitor for
> an object).
>
> This is true even if running the code on a single-core CPU.
>
> Presumably, the choice of the existing hashCode() implementation for
> String was made based on the assumption that the overhead of even
> "volatile" was more important to avoid than the remote possibility that
> multiple threads may wind up calculating the hash code for the same string
> instance.
>
> And, also presumably, this is the same thinking that led to the code in
> which those unfortunate strings with a hash code of 0 always wind up
> getting recalculated no matter what (because otherwise, one would need a
> flag, separate from the hash code field, which could wind up out-of-sync
> with the hash code field absent synchronization, such as "volatile"�as
> discussed previously in this thread).

Or simply giving them an arbitrary different value for hashCode() (e.g. 1),
which is vanishingly unlikely to create any problems. (Though it does break
the explicit contract of String.hashCode(), which would need to be changed.)

From: Lew on
Peter Duniho wrote
>> And, also presumably, this is the same thinking that led to the code
>> in which those unfortunate strings with a hash code of 0 always wind
>> up getting recalculated no matter what (because otherwise, one would
>> need a flag, separate from the hash code field, which could wind up
>> out-of-sync with the hash code field absent synchronization, such as
>> "volatile"…as discussed previously in this thread).

Mike Schilling wrote:
> Or simply giving them an arbitrary different value for hashCode() (e.g.
> 1), which is vanishingly unlikely to create any problems. (Though it
> does break the explicit contract of String.hashCode(), which would need
> to be changed.)

I'm still not clear on what makes 1 or 42 or 17 or any other value
significantly "better" than 0. One might as well use 0, since likely the only
String with a hash to be recalculated each time in practice will be "", and as
others have pointed out, that one is very quick to recalculate.

"" is probably the single most common String to have a degenerate hash code.
The check for 0 cannot be eliminated, so all you'd save with the "1" hack
would be the very fast re-calculation of 0. Yes, for the rare cases others
than "" you might have a truly anomalous result, but code complexity and a
magic number for that must just not have seemed worth the benefit.

OTOH, you could keep the value of 0 for "" by picking a sentinel value
different from 0, say 1 or 17 or 42 or 23 or -1, initializing 'hash' to that
value and using it in the conditional within 'hashCode()'.

None of these things really matter. The only reason that the Implementors
even bothered to lazy-load the hash for String was the knowledge that Strings
would be incessantly allocated and only sometimes used as keys or other hash
sources. In the latter case, the overhead of the lazy calculation is quickly
masked, and incessant tweaking for the remaining 0.001% possible speedup is by
the wayside. The rest of us should not lazy-load hashes for immutable
instances anyway.

--
Lew
From: Mike Schilling on


"Lew" <noone(a)lewscanon.com> wrote in message
news:i209tj$90n$1(a)news.albasani.net...
> Peter Duniho wrote
>>> And, also presumably, this is the same thinking that led to the code
>>> in which those unfortunate strings with a hash code of 0 always wind
>>> up getting recalculated no matter what (because otherwise, one would
>>> need a flag, separate from the hash code field, which could wind up
>>> out-of-sync with the hash code field absent synchronization, such as
>>> "volatile"…as discussed previously in this thread).
>
> Mike Schilling wrote:
>> Or simply giving them an arbitrary different value for hashCode() (e.g.
>> 1), which is vanishingly unlikely to create any problems. (Though it
>> does break the explicit contract of String.hashCode(), which would need
>> to be changed.)
>
> I'm still not clear on what makes 1 or 42 or 17 or any other value
> significantly "better" than 0. One might as well use 0, since likely the
> only String with a hash to be recalculated each time in practice will be
> "", and as others have pointed out, that one is very quick to recalculate.

If 0 means "I haven't calculated it yet", and a calculated value of 0 is
stored as 1, you never need to recalculate. I don't mind if you change it to
17182891 to mean "I haven't calculated it yet", and store a calculated value
of 17182891 as 23023492348. It's the same thing -- you never need to
recalculate,