From: Fencer on 29 Jul 2010 13:52 Hello, I have a large set of strings and I need to find all combinations of two of those string and perform an algorithm on them (the algorithm is actually an XQuery that runs over a large dataset so I won't post that here). The combinations foo-bar and bar-foo are considered to be equal so only one should be generated. I wrote a static class method that takes an array of Strings and returns all combinations in a Vector. The Vector stores a class Pair which I've written myself too. The complete code is posted below with output from a sample run. It seems to work, but how can it be improved and get rid of the Pair class? Thanks! package utility; import java.util.HashSet; import java.util.Set; import java.util.Vector; public class MiscUtils { public static void main(String[] args) { Vector<Pair> combos = findCombinations(new String[]{"chebi", "kegg.compound"}); System.out.println(combos.size()); System.out.println(combos); } public static Vector<Pair> findCombinations(String[] strings) { Vector<Pair> combinations = new Vector<Pair>(); Set<String> alreadyPaired = new HashSet<String>(); for (int i = 0; i < strings.length; ++i) { for (int j = 0; j < strings.length; ++j) { if (i != j && !alreadyPaired.contains(strings[j])) { combinations.add(new Pair(strings[i], strings[j])); } } alreadyPaired.add(strings[i]); } return combinations; } } class Pair { public Pair(String s1, String s2) { this.s1 = s1; this.s2 = s2; } public String getFirst() { return s1; } public String getSecond() { return s2; } public String toString() { return s1 + ":" + s2; } private String s1 = null; private String s2 = null; } Sample run: 1 [chebi:kegg.compound] - Fencer
From: Peter Duniho on 29 Jul 2010 14:07 Fencer wrote: > Hello, I have a large set of strings and I need to find all combinations > of two of those string and perform an algorithm on them (the algorithm > is actually an XQuery that runs over a large dataset so I won't post > that here). The combinations foo-bar and bar-foo are considered to be > equal so only one should be generated. Do you mean you perform a single operation on the entire set of string pairs? Or that you iterate through the set of string pairs, performing some operation on each pair? If the latter, then one of the biggest improvements you might make is to not keep the pairs in a collection at all. Just feed them one pair at a time to the algorithm as you generate them, discarding the pair once it's been processed. > I wrote a static class method that takes an array of Strings and returns > all combinations in a Vector. The Vector stores a class Pair which I've > written myself too. The complete code is posted below with output from a > sample run. It seems to work, but how can it be improved and get rid of > the Pair class? Thanks! It seems to me that another significant improvement you could easily make is to get rid of the Set<String> for excluding pairs you already have. Instead, just don't generate the reversed pairs in the first place: for (int i = 0; i < strings.length; i++) { for (int j = i + 1; j < strings.length; j++) { combinations.add(new Pair(strings[i], strings[j])); } } The above does not account for the possibility of duplicate strings in the input. In that case, you would still get a duplicated pair. That can be resolved by removing the duplicates in the input before generating the pairs. Of course, if you have huge amounts of RAM, then you may find that reducing the memory overhead is less important than reducing processing overhead. In that case, you could deal with duplicates using a Set<Pair> just as you had been before, but still at least avoiding generating reversed pairs as above. If you're guaranteed the input won't have duplicate strings, then of course dealing with duplicates is not necessary. There may be more "clever" improvements you could make. But I think it's a good idea to deal with the more basic inefficiencies in the algorithm first. Pete
From: Fencer on 29 Jul 2010 14:19 On 2010-07-29 20:07, Peter Duniho wrote: > > Do you mean you perform a single operation on the entire set of string > pairs? Or that you iterate through the set of string pairs, performing > some operation on each pair? Thanks for your reply my Duniho. I should have been clearer: I iterate through all my combinations and runs the all algorithm on each combination. The algorithm is a complicated XQuery that runs over a large data set. The number of combinations I have is not that large actually, I just didn't want to do the pairing by hand. It's true that I don't really need to keep the combinations around but I would actually like to do that. The input data could actually contain duplicates but that would be an error on my part. I mostly want to get rid of the Pair class. :) > > If the latter, then one of the biggest improvements you might make is to > not keep the pairs in a collection at all. Just feed them one pair at a > time to the algorithm as you generate them, discarding the pair once > it's been processed. > >> I wrote a static class method that takes an array of Strings and >> returns all combinations in a Vector. The Vector stores a class Pair >> which I've written myself too. The complete code is posted below with >> output from a sample run. It seems to work, but how can it be improved >> and get rid of the Pair class? Thanks! > > It seems to me that another significant improvement you could easily > make is to get rid of the Set<String> for excluding pairs you already > have. Instead, just don't generate the reversed pairs in the first place: > > for (int i = 0; i < strings.length; i++) > { > for (int j = i + 1; j < strings.length; j++) > { > combinations.add(new Pair(strings[i], strings[j])); > } > } > > The above does not account for the possibility of duplicate strings in > the input. In that case, you would still get a duplicated pair. That can > be resolved by removing the duplicates in the input before generating > the pairs. > > Of course, if you have huge amounts of RAM, then you may find that > reducing the memory overhead is less important than reducing processing > overhead. In that case, you could deal with duplicates using a Set<Pair> > just as you had been before, but still at least avoiding generating > reversed pairs as above. > > If you're guaranteed the input won't have duplicate strings, then of > course dealing with duplicates is not necessary. > > There may be more "clever" improvements you could make. But I think it's > a good idea to deal with the more basic inefficiencies in the algorithm > first. > > Pete
From: Fencer on 29 Jul 2010 14:33 I rewrote it like this: package utility; import java.util.Arrays; import java.util.HashSet; import java.util.LinkedHashMap; import java.util.Map; import java.util.Set; public class MiscUtils { public static void main(String[] args) { Map<String, String> combos = findCombinations(new String[]{"chebi", "kegg.compound", "chebi"}); System.out.println(combos.size()); System.out.println(combos); } public static Map<String, String> findCombinations(String[] strings) { Set<String> set = new HashSet<String>(Arrays.asList(strings)); if (set.size() < strings.length) { System.err.println("Input array contained duplicates."); strings = (String[])set.toArray(); } Map<String, String> combos = new LinkedHashMap<String, String>(); for (int i = 0; i < strings.length; i++) { for (int j = i + 1; j < strings.length; j++) { combos.put(strings[i], strings[j]); } } return combos; } } but apparently I can't handle duplicates the way I try handle them, because I get following runtime error: Input array contained duplicates. Exception in thread "main" java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Ljava.lang.String; at utility.MiscUtils.findCombinations(MiscUtils.java:23) at utility.MiscUtils.main(MiscUtils.java:13) This line strings = (String[])set.toArray(); is what the runtime system is unhappy with. - Fencer
From: Screamin Lord Byron on 29 Jul 2010 14:41 On 29.07.2010 19:52, Fencer wrote: > Hello, I have a large set of strings and I need to find all combinations > of two of those string and perform an algorithm on them (the algorithm > is actually an XQuery that runs over a large dataset so I won't post > that here). The combinations foo-bar and bar-foo are considered to be > equal so only one should be generated. > > I wrote a static class method that takes an array of Strings and returns > all combinations in a Vector. The Vector stores a class Pair which I've > written myself too. The complete code is posted below with output from a > sample run. It seems to work, but how can it be improved and get rid of > the Pair class? Thanks! If you need a concept of pair, then you need some class that represents it. Why would you want to get rid of it? What is the reason you are using a Vector (synchronization)? Why not implement your equals() and hashcode() methods of the Pair class, and simply use the HashSet for storing pairs? Then you wouldn't have to worry about duplicates at all. > for (int i = 0; i < strings.length; ++i) { > for (int j = 0; j < strings.length; ++j) { for (int j = i+1;.....
|
Next
|
Last
Pages: 1 2 Prev: ScheduledExecutorService very inaccurate? Next: Threads reading a file at the same time |