From: Andreas Leitgeb on 4 Jul 2010 09:06 Tom Anderson <twic(a)urchin.earth.li> wrote: > I want to store some things, each filed under a string key. I then want to > look those things up with other strings, finding the thing whose key is > the longest available prefix of the lookup string. > So, if i store: > /products -> foo > /products/fruit -> bar > Then lookups go like: > /products/furniture/chairs -> foo > /products/fruit/bananas -> bar My approach would be, to partition the keys by length, first, and for each length maintain a separate HashMap. The structure to hold the key-thing pairs: ArrayList<HashMap<String,Thing>> To add a pair: put it into the hashmap indexed by key.length() in the ArrayList (extending the ArrayList and creating a new HashMap as necessary) To lookup a string: reverse-iterate the ArrayList from .size()-1 downto 1: for each index i that has a non-empty HashMap stored in the ArrayList, obtain lookupString.substring(0,i) and look it up in that non-empty HashMap stored at i. If the key lengths of your keys are very sparsely distributed, replace the ArrayList<?> with TreeMap<Integer,HashMap<String,Thing>> PS: I think, there is no such Collection ready made in Java standard library. I have no idea whether it exists in apache's or other class libraries. I don't know Patricia Trees, either, so cannot answer those direct questions.
From: Eric Sosman on 4 Jul 2010 13:45 On 7/4/2010 12:59 PM, Tom Anderson wrote: > On Fri, 2 Jul 2010, Eric Sosman wrote: > >> On 7/1/2010 11:42 PM, Tom Anderson wrote: >> >>> I want to store some things, each filed under a string key. I then want >>> to look those things up with other strings, finding the thing whose key >>> is the longest available prefix of the lookup string. >>> >>> So, if i store: >>> >>> /products -> foo >>> /products/fruit -> bar >>> >>> Then lookups go like: >>> >>> /products/furniture/chairs -> foo >>> /products/fruit/bananas -> bar >> >> If you add "/products/fuzz -> baz" to the collection, what >> should a lookup on "/products/f" return? > > foo. Neither of the other keys is a prefix of the search term. > >>> If i write this myself, is there a data structure better than a Patricia >>> tree for it? >> >> The scale of the problem depends on whether a key's "words" must match >> in their entirety or only partially. If it's a "word-by-word" >> discipline (so /products/f yields foo), I think an ordinary multi-way >> tree would be fine; no need for anything as fancy as Patricia. If >> /products/f matches both /products/fruit and /products/fuzz, something >> trickier may be needed (at the least, you've got to decide what to do >> about the ambiguity). > > I evidently haven't explained this clearly. A lookup with /products/f > would never match either /products/fruit or /products/fuzz as entries, > because neither is a prefix of the lookup term. > > What i'm trying to do is a bit (okay, a lot) like servlet mapping. If i > set up a mapping for /user/editProfile/*, then i want to catch requests > for /user/editProfile/jim, /user/editProfile/bob, etc. Okay. I don't know what's already out there and ready-written, but if I were going to write such a thing from scratch I'd probably come up with something like (just typed in; not compiled or tested) class Node { /** Mapping if we match this far and no further */ private final Target target; /** Links to deeper Nodes. */ private final Map<String,Node> links; Node(Target target) { this.target = target; this.links = new HashMap<String,Node>(); } void setLink(String key, Node dest) { links.put(key, dest); } Node getLink(String key) { return links.get(key); } Target match(Iterator<String> keys) { if (! keys.hasNext()) return null; Node next = getLink(keys.next()); Target t = (next == null) ? target : next.match(keys); return (t == null) ? target : t; } } The internal Map might be something lighter-weight if the "degrees" of the Nodes are expected to be small, but that's the idea. It's a lot like traversing a directory tree, with the added fillip of a default value for incomplete matches. -- Eric Sosman esosman(a)ieee-dot-org.invalid
From: Andreas Leitgeb on 4 Jul 2010 16:04 Tom Anderson <twic(a)urchin.earth.li> wrote: >> To lookup a string: >> reverse-iterate the ArrayList from .size()-1 downto 1: for each >> index i that has a non-empty HashMap stored in the ArrayList, >> obtain lookupString.substring(0,i) and look it up in that >> non-empty HashMap stored at i. > > Interesting, thanks. That would certainly work. At first glance, it seems > a bit clunky, but you only need to to do lookups for the lengths that have > keys, and each one is a hashtable lookup, so it should be quite quick. If your set of keys is going to be small enough, you can get by with a more lightweight variant: maintain an int[longestKey.length()] to count each length's key-count and keep all the key-value pairs together in one HashMap. Only obtain lookupString.substring(0,i) for each i whose array-element is non-zero. For those, however, you'll search the complete HashMap, which may be slightly less performant than the original.
From: Robert Klemme on 4 Jul 2010 16:34 On 04.07.2010 22:04, Andreas Leitgeb wrote: > Tom Anderson<twic(a)urchin.earth.li> wrote: >>> To lookup a string: >>> reverse-iterate the ArrayList from .size()-1 downto 1: for each >>> index i that has a non-empty HashMap stored in the ArrayList, >>> obtain lookupString.substring(0,i) and look it up in that >>> non-empty HashMap stored at i. >> >> Interesting, thanks. That would certainly work. At first glance, it seems >> a bit clunky, but you only need to to do lookups for the lengths that have >> keys, and each one is a hashtable lookup, so it should be quite quick. > > If your set of keys is going to be small enough, you can get by with a > more lightweight variant: Hmm, if the data set is small enough you can get by with a wide range of solutions - even linear search in a list or array may be fast enough. This problem however is naturally implemented with some Trie variant. That's the data structure that fits the problem best IMHO. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
First
|
Prev
|
Pages: 1 2 Prev: Nimbus JTabbedPane shuffles tabs Next: Loopholes in java Encapsulation ? |