Prev: Hibernate in Java Question
Next: Generics: instantiating an object from a class name inconfiguration
From: Daniel Pitts on 9 Jul 2010 16:06 On 7/9/2010 12:50 PM, Patricia Shanahan wrote: > On 7/9/2010 12:45 PM, Tom Anderson wrote: >> On Thu, 8 Jul 2010, Eric Sosman wrote: >> >>> Or, you could have BigList implement List but "lie" in its .size() >>> method, in somewhat the same way TreeSet "lies" about the Set contract. >> >> How does TreeSet lie about the Set contract? > > The case I'm aware of involves a TreeSet with a Comparator, that is not > consistent with the .equals methods of the TreeSet elements. The TreeSet > always goes by the Comparator results. That means the TreeSet could > contain elements a and b such that a.equals(b). > > Patricia It is my opinion that (given perfect hindsight), the Collections API should have included several interfaces for comparing items, not just one. interface Ordering<T> { enum Order { LESSER, EQUAL, GREATER } Order compare(T left, T right); } interface Hasher<T> { long hash(T value); } interface Equivalence<T> { boolean equal(T left, T right); } Then, all the appropriate Collection code could use those interfaces. There should also be the obvious default implementations. Not to mention that iterators should have separate methods for advancing and retrieving, and options for non-fail-fast for certain kinds of collections (linked lists shouldn't invalidate any iterator unless the item itself is removed). -- Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
From: Eric Sosman on 9 Jul 2010 16:54 On 7/9/2010 4:06 PM, Daniel Pitts wrote: > > It is my opinion that (given perfect hindsight), the Collections API > should have included several interfaces for comparing items, not just one. > > interface Ordering<T> { > enum Order { > LESSER, EQUAL, GREATER > } > > Order compare(T left, T right); > } The only difference I see between this and Comparator<T> is the use of the enum instead of an int. The advantages are not immediately obvious. Meanwhile, instead of `comp(o1,o2) <= 0' one would write `comp(o1,o2) != Order.GREATER', which reads even less smoothly than `! (comp(o1,o2) > 0'. > interface Hasher<T> { > long hash(T value); > } A 64-bit hashCode() would be of little use until you got to more than 2^32 hash buckets. Just saying. > interface Equivalence<T> { > boolean equal(T left, T right); > } I don't get it: Why not just use equals()? I guess a class could choose not to implement Equivalence at all (and thus make itself unusable in whatever framework relies on Equivalence), but is that an advantage? Also, you could get a compile-time error instead of a run-time `false' for trying to call equal() on references of dissimilar classes; again, where's the benefit? > Then, all the appropriate Collection code could use those interfaces. > There should also be the obvious default implementations. It might be helpful to give some examples of the "appropriate" uses, and of the "obvious" defaults. For example, how does a HashMap make use of a key that implements Hasher? Does it reflect on each key its given and make a run-time choice between using hash() and hashCode()? I don't get it ... > Not to mention that iterators should have separate methods for advancing > and retrieving, I guess this would allow `iter.advance(666666)', with a possible savings if the Iterator were backed by a disk file or something, where you could just skip over two thirds of a million items without looking at them. But, but, but!!! this only makes sense for a sequential data structure like a List -- and for List you've already *got* access by index without using an Iterator at all! I'm beginning to sound like a broken record[*], I know, but what's the benefit? > and options for non-fail-fast for certain kinds of > collections (linked lists shouldn't invalidate any iterator unless the > item itself is removed). Not so sure about that. The notion of "what's the next item?" can certainly change if things are added or removed "near" the point where an Iterator is active. Or, in the `iter.advance(666666)' case, you may find yourself at an unexpected place if somebody else has inserted 555555 items between where you are and where you're going ... See also ListIterator. [*] How many programmers nowadays have ever heard a broken record? Or dialed a telephone, or seen a core? Gad, I'm feeling as old as Methuselah.[**] [**] Before your time, laddie, before your time. -- Eric Sosman esosman(a)ieee-dot-org.invalid
From: ClassCastException on 10 Jul 2010 01:32
On Fri, 09 Jul 2010 16:54:53 -0400, Eric Sosman wrote: > On 7/9/2010 4:06 PM, Daniel Pitts wrote: >> interface Hasher<T> { >> long hash(T value); >> } > > A 64-bit hashCode() would be of little use until you got to > more than 2^32 hash buckets. Just saying. Gets us back to the original topic. :-) >> interface Equivalence<T> { >> boolean equal(T left, T right); >> } > > I don't get it: Why not just use equals()? I guess a class > could choose not to implement Equivalence at all (and thus make itself > unusable in whatever framework relies on Equivalence), but is that an > advantage? Also, you could get a compile-time error instead of a > run-time `false' for trying to call equal() on references of dissimilar > classes; again, where's the benefit? > >> Then, all the appropriate Collection code could use those interfaces. >> There should also be the obvious default implementations. > > It might be helpful to give some examples of the "appropriate" > uses, and of the "obvious" defaults. For example, how does a HashMap > make use of a key that implements Hasher? Does it reflect on each key > its given and make a run-time choice between using hash() and > hashCode()? I don't get it ... Note that those interfaces specify methods with an "extra" parameter each. They're like Comparator versus compareTo/Comparable. The purpose is clear: so a HashMap could be given, optionally, a Hasher<K> to use in place of the keys' own hashCode methods and an Equivalence<K> to use in place of the keys' own equals methods. One obvious benefit is that you get rid of IdentityHashMap by folding that functionality into plain HashMap. Instead of a separate class, you'd get an identity hash map with new HashMap<K,V>(new Hasher<K>() { public long hash (K x) { return System.identityHashCode(x); } }, new Equivalence<K>() { public boolean equal (K x, K y) { return x == y; } }; or with canned instances of IdentityHasher and IdentityEquivalence provided by the library. With this, you would also be able to get identity WeakHashMaps and the like; by separating the "how strong is the reference" aspect into one class and the "how is identity decided" aspect into another, you avoid a combinatorial explosion and possible lacunae of capability (right now we have no WeakIdentityHashMap, in particular). You'd also be able to reduce some of the clumsier uses of HashMap to HashSet. Picture a class Record { public final int id; public final String name; public final String address; } with the obvious equality semantics (all fields equal) and constructor added. Now throw in an Equivalence and a Hasher that use only the record's id field. So maybe you keep a change log for an individual person as a List<Record>, chronological: id 0001 name Jane Herman address 1600 Pennsylvania Avenue id 0001 name Jane Herman address 18 Wisteria Lane id 0001 name Jane Thurston address 18 Wisteria Lane OK, so she got voted out of office, then got married, or something like that. Of course you might want to throw a jumble of records in a Set and have different ones of the above count as different. But you might also want a record of the current state of affairs. Given a HashSet implementation that can use a supplied Hasher and Equivalence the way TreeSet can use an externally-supplied Comparator, and that also has the semantics that adding an element that equals an already-present element replaces that element with the new one, you can update the 0001 record simply by putting a more recent one into this set -- if it already has a 0001 record, the provided Hasher and Equivalence will lead to the new one replacing that one. So in some contexts you can treat records identically only if they're actually identical; in others if they have the same id; all without monkeying with an explicit id-to-record HashMap or suchlike. Another way to achieve this last, though, is to have a KeyExtractor<T> interface that you implement in this case to return the id field of a Record and a HashSet implementation that uses the object itself as the key in its internal HashMap if no KeyExtractor is specified during construction, and uses the supplied KeyExtractor otherwise. This is actually closer to the conceptual truth of what you're doing in a case like this: keying on the id field in a particular HashSet. The implementation would be something like public class HashSet<T> { private HashMap<Object,T> data = new HashMap<Object,T>(); private KeyExtractor<T> ke = new KeyExtractor<T>() { public Object getKey (T val) { return val; } } ... public T put (T newElem) { Object key = ke.getKey(newElem); T oldElem = data.get(key); data.put(key, newElem); return oldElem; } } whereas the Hasher/Equivalence version would just pass the Hasher and Equivalence to the HashMap constructor when initializing Data and not have the key local in put, just newElem. The really interesting thing is that we don't really need to wait for any hypothetical future Sun (Oracle?) update to do some of this; KeyExtractor and the above variation of HashSet can be implemented now, perhaps calling the latter RecordMap instead as it acts as a map from key fields of records of some sort to whole records, in the typical case, and in fact you probably do also want to do lookups of whole records by just the keys. And you might sometimes want to hold the records via weak or soft references, e.g. to make it a cache. In that case you want to allow specifying two more things, a type of reference to use (enum ReferenceType {STRONG; SOFT; WEAK;} with default STRONG) and an optional ExpensiveGetter that defaults to return null but can be replaced with one whose expensiveGet() does something like, say, retrieve disk records. Then get() calls expensiveGet() on not-found and if expensiveGet() doesn't throw or return null, does a put() before returning the result. You can throw in another type parameter, too: public class RecordMap <K,V,E> { private ExpensiveGetter<K,V,E> eg = new ExpensiveGetter<K,V,E>() { public V expensiveGet (K key) throws E { return null; } } private HashMap<K,Object> data = new HashMap<K,Object>(); public enum ReferenceType { STRONG { public Object wrap (Object o) { return o; } }; SOFT { public Object wrap (Object o) { return new SoftReference(o); } }; WEAK { public Object wrap (Object o) { return new WeakReference(o); } }; public abstract Object wrap (Object o); } private ReferenceType referenceType = ReferenceType.STRONG; ... @SuppressWarnings("unchecked") public V get (K key) throws E { Object result = data.get(key); if (result instanceof Reference) result = result.get(); if (result != null) return (V)result; result = eg.expensiveGet(key); if (result == null) return null; put(key, result); return (V)result; } public void put (K key, V val) { data.put(key, referenceType.wrap(val); } } This is a bit messy but it's just a quick draft. It doesn't actually implement Map because it doesn't quite fit the Map contract in a few places (and making it do so would be difficult, particularly since get seems to have to be able to throw exceptions). You might want to change ExpensiveGet to a more general BackingSource that provides both get and put methods; puts write through to the real backing store whenever performed as well as writing to the RecordMap in memory, making a RecordMap with a non-default BackingSource a cache backed by something in a two-way fashion. I may be a bit rusty on the syntax of giving enum constants behavior, too. Clearly in this case that's the right thing to do, from an OO perspective, rather than having a switch clause in the put method that could get out of synch if someone decided to add PHANTOM to the thing for whatever reason or a future JDK added more Reference types that influenced GC policy in as-yet-unforeseen ways. |