From: Andreas Leitgeb on
Tom Anderson <twic(a)urchin.earth.li> wrote:
> I want to store some things, each filed under a string key. I then want to
> look those things up with other strings, finding the thing whose key is
> the longest available prefix of the lookup string.
> So, if i store:
> /products -> foo
> /products/fruit -> bar
> Then lookups go like:
> /products/furniture/chairs -> foo
> /products/fruit/bananas -> bar

My approach would be, to partition the keys by length, first,
and for each length maintain a separate HashMap.

The structure to hold the key-thing pairs:
ArrayList<HashMap<String,Thing>>

To add a pair: put it into the hashmap indexed by key.length()
in the ArrayList (extending the ArrayList and creating a new
HashMap as necessary)

To lookup a string:
reverse-iterate the ArrayList from .size()-1 downto 1: for each
index i that has a non-empty HashMap stored in the ArrayList,
obtain lookupString.substring(0,i) and look it up in that
non-empty HashMap stored at i.

If the key lengths of your keys are very sparsely distributed,
replace the ArrayList<?> with TreeMap<Integer,HashMap<String,Thing>>

PS: I think, there is no such Collection ready made in Java standard
library. I have no idea whether it exists in apache's or other class
libraries. I don't know Patricia Trees, either, so cannot answer
those direct questions.

From: Eric Sosman on
On 7/4/2010 12:59 PM, Tom Anderson wrote:
> On Fri, 2 Jul 2010, Eric Sosman wrote:
>
>> On 7/1/2010 11:42 PM, Tom Anderson wrote:
>>
>>> I want to store some things, each filed under a string key. I then want
>>> to look those things up with other strings, finding the thing whose key
>>> is the longest available prefix of the lookup string.
>>>
>>> So, if i store:
>>>
>>> /products -> foo
>>> /products/fruit -> bar
>>>
>>> Then lookups go like:
>>>
>>> /products/furniture/chairs -> foo
>>> /products/fruit/bananas -> bar
>>
>> If you add "/products/fuzz -> baz" to the collection, what
>> should a lookup on "/products/f" return?
>
> foo. Neither of the other keys is a prefix of the search term.
>
>>> If i write this myself, is there a data structure better than a Patricia
>>> tree for it?
>>
>> The scale of the problem depends on whether a key's "words" must match
>> in their entirety or only partially. If it's a "word-by-word"
>> discipline (so /products/f yields foo), I think an ordinary multi-way
>> tree would be fine; no need for anything as fancy as Patricia. If
>> /products/f matches both /products/fruit and /products/fuzz, something
>> trickier may be needed (at the least, you've got to decide what to do
>> about the ambiguity).
>
> I evidently haven't explained this clearly. A lookup with /products/f
> would never match either /products/fruit or /products/fuzz as entries,
> because neither is a prefix of the lookup term.
>
> What i'm trying to do is a bit (okay, a lot) like servlet mapping. If i
> set up a mapping for /user/editProfile/*, then i want to catch requests
> for /user/editProfile/jim, /user/editProfile/bob, etc.

Okay. I don't know what's already out there and ready-written,
but if I were going to write such a thing from scratch I'd probably
come up with something like (just typed in; not compiled or tested)

class Node {

/** Mapping if we match this far and no further */
private final Target target;

/** Links to deeper Nodes. */
private final Map<String,Node> links;

Node(Target target) {
this.target = target;
this.links = new HashMap<String,Node>();
}

void setLink(String key, Node dest) {
links.put(key, dest);
}

Node getLink(String key) {
return links.get(key);
}

Target match(Iterator<String> keys) {
if (! keys.hasNext())
return null;
Node next = getLink(keys.next());
Target t = (next == null) ? target : next.match(keys);
return (t == null) ? target : t;
}
}

The internal Map might be something lighter-weight if the "degrees"
of the Nodes are expected to be small, but that's the idea. It's a
lot like traversing a directory tree, with the added fillip of a
default value for incomplete matches.

--
Eric Sosman
esosman(a)ieee-dot-org.invalid
From: Andreas Leitgeb on
Tom Anderson <twic(a)urchin.earth.li> wrote:
>> To lookup a string:
>> reverse-iterate the ArrayList from .size()-1 downto 1: for each
>> index i that has a non-empty HashMap stored in the ArrayList,
>> obtain lookupString.substring(0,i) and look it up in that
>> non-empty HashMap stored at i.
>
> Interesting, thanks. That would certainly work. At first glance, it seems
> a bit clunky, but you only need to to do lookups for the lengths that have
> keys, and each one is a hashtable lookup, so it should be quite quick.

If your set of keys is going to be small enough, you can get by with a
more lightweight variant:
maintain an int[longestKey.length()] to count each length's key-count
and keep all the key-value pairs together in one HashMap.

Only obtain lookupString.substring(0,i) for each i whose array-element
is non-zero. For those, however, you'll search the complete HashMap,
which may be slightly less performant than the original.

From: Robert Klemme on
On 04.07.2010 22:04, Andreas Leitgeb wrote:
> Tom Anderson<twic(a)urchin.earth.li> wrote:
>>> To lookup a string:
>>> reverse-iterate the ArrayList from .size()-1 downto 1: for each
>>> index i that has a non-empty HashMap stored in the ArrayList,
>>> obtain lookupString.substring(0,i) and look it up in that
>>> non-empty HashMap stored at i.
>>
>> Interesting, thanks. That would certainly work. At first glance, it seems
>> a bit clunky, but you only need to to do lookups for the lengths that have
>> keys, and each one is a hashtable lookup, so it should be quite quick.
>
> If your set of keys is going to be small enough, you can get by with a
> more lightweight variant:

Hmm, if the data set is small enough you can get by with a wide range of
solutions - even linear search in a list or array may be fast enough.
This problem however is naturally implemented with some Trie variant.
That's the data structure that fits the problem best IMHO.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/