From: Jens Auer on 22 Jun 2010 08:39 On 22 Jun., 01:34, Brad <byte8b...(a)gmail.com> wrote: > I think there are better (more efficient) ways to do this, but I've > only ever needed this simple approach: > > void compare(const std::vector<std::string>& strings) > { > // Normally, this is randomly generated. It is a C style > string. > char the_string[4] = "abc"; > > std::vector<std::string>::const_iterator send(strings.end()); > > for (std::vector<std::string>::const_iterator sit = strings.begin(); > sit != send; ++sit) > { > if (the_string == *sit) > std::clog << "MATCH:\t" << *sit << std::endl; > } > > } > > Basically, I'm comparing a randomly generated string to a vector of > strings by iterating through the vector (as seen above). Here is the > problem: The vector may have one or millions of strings and execution > time slows considerably when I go above a few thousand strings. > > Is there a more efficient way to approach this? My code works fine on > smaller vectors, I just never expected to do millions and it may not > be possible to do that many as quickly as a few, but I wanted to ask. > Perhaps there is something in algorithm that I could use. Just point > me to it. I prefer standard C++ wherever possible. You could use std::binary_search if you sort the vector before or std::set to find string in logarithmic time (ignoring the cost for the non-constant string comparison). If you use the new standard, you could also employ a hash table which may be faster or slower. Further, if we leave standard c++ and the string vector is a fixed set of strings you could use a suffix tree to search for a string of length m in O(m) time. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Ulrich Eckhardt on 22 Jun 2010 08:32 Brad wrote: > std::vector<std::string>::const_iterator send(strings.end()); > for (std::vector<std::string>::const_iterator sit = strings.begin(); > sit != send; ++sit) [...] Just for the record, you can declare multiple objects in the header of a for loop: for(my_vector::const_iterator it=strings.begin(), end=strings.end(); it!=end; ++it) > Basically, I'm comparing a randomly generated string to a vector of > strings by iterating through the vector (as seen above). Here is the > problem: The vector may have one or millions of strings and execution > time slows considerably when I go above a few thousand strings. > > Is there a more efficient way to approach this? My code works fine on > smaller vectors, I just never expected to do millions and it may not > be possible to do that many as quickly as a few, but I wanted to ask. Not enough info. If you don't care for the order of the strings, you could use a multiset<string> instead of a vector, which would give you better lookup times. You might be able to exploit other properties of the vector and its content, too. In any case, what you should do first is to create tests that tell you how fast your code runs, so you can compare different optimisations and also use a profiler to find out where you code spends most of its time. Uli -- Sator Laser GmbH Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932 [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: bf on 22 Jun 2010 08:33 On Jun 22, 1:34 am, Brad <byte8b...(a)gmail.com> wrote: > Basically, I'm comparing a randomly generated string to a vector of > strings by iterating through the vector (as seen above). Here is the > problem: The vector may have one or millions of strings and execution > time slows considerably when I go above a few thousand strings. Given the constraints mentioned, I don't think you can do much better. You can shorten the code slightly as: bool has_string(const std::vector<std::string> &haystack, const std::string& needle) { return std::find(haystack.begin(), haystack.end(), needle) != haystack.end(); } > Is there a more efficient way to approach this? My code works fine on > smaller vectors, I just never expected to do millions and it may not > be possible to do that many as quickly as a few, but I wanted to ask. > Perhaps there is something in algorithm that I could use. Just point > me to it. I prefer standard C++ wherever possible. If you can change your vector of strings into a std::set (or unordered_set) you'll be much better off (O(log2(n)) for set, and ideally(O(1) for unordered_set, although its worst case is O(n)) compared to O(n) for your current implementation. If you can't do that, but you can control how the vector is filled, to make it always sorted, you can use std::binary_search() (using it the same way as std::find() above. Also O(log2(n))). Departing from the standard, a radix tree is likely to serve you well, but then you'd need to write a lot of non-trivial code, so I'd avoid that unless deemed necessary (or as an exercise.) _ /Bjorn -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Mathias Gaunard on 22 Jun 2010 08:56 On Jun 22, 12:34 am, Brad <byte8b...(a)gmail.com> wrote: > [...] > Basically, I'm comparing a randomly generated string to a vector of > strings by iterating through the vector (as seen above). Here is the > problem: The vector may have one or millions of strings and execution > time slows considerably when I go above a few thousand strings. > > Is there a more efficient way to approach this? Yes. Searching for an entry in a list (or even better, set) can be done in logarithmic time by sorting the list and doing a binary search, or in average constant-time with hashing techniques. You may want to take a basic algorithmics class. Note this is not a problem tied to C++. C++ does, however, provide everything you might need already, so you don't have to reimplement anything. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Mathias Gaunard on 22 Jun 2010 09:09 On Jun 22, 1:28 pm, George Neuner <gneun...(a)comcast.net> wrote: > RegEx? A dedicated string search algorithm such as Boyer-Moore or > Knuth-Morris-Pratt or Rabin-Karp? Those are methods to find a substring within a string. What he's looking for is a particular string among a collection of strings. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: New types defined in a return type Next: Generating a derived class from a base class |