Prev: omission of "virtual" in overridden method declarations in derived classes
Next: Unknown syntax - Template parameter void(void)
From: Andrew on 26 Feb 2010 05:29 I am working with some legacy code that is in the process of changing to use std::vector<char> instead of a C-style char array. The C-style char array is currently allocated using new char [n]. This array is passed to various C string functions such as strstr, strncmp etc. I need to do the same work but with a std::vector. I googled around for a bit to see if I could find anyone who had already done this work but my search revealed nothing. I wonder if some kind person could point me in the right direction. Now, I realise I could code it all myself but surely there must be something out there where this has already been done. I would rather build on the work of others than re-invent the wheel. And for performance critical apps such as the one I am working on, it is common advice to use std::vector<char> instead of std::string or C- style char arrays. In the past I often seen this advice given out (it's even in More Effective STL) but without the utility functions to back it up I can see people ignoring this advice. FWIW, the app is reading in sections of a *huge* XML file. A buffer is used to hold a fragment which is then parsed using the Xerces SAX parser (thus it avoids creating a DOM object). I want the buffer to be a std::vector<char> that sometimes expands to reach a new watermark. I think I've got that bit working but the string compares fail coz it goes off the end of the vector. Regards, Andrew Marlow -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Lance Diduck on 27 Feb 2010 02:16 Premature optimization is the root of all evil > Now, I realise I could code it all myself but surely there must be > something out there where this has already been done. Alexandrescu wrote and published "FlexString" back in 2001 https://devel.nuclex.org/external/svn/loki/trunk/native/include/loki/flex/ > it is > common advice to use std::vector<char> instead of std::string or C- > style char arrays. In the past I often seen this advice given out > (it's even in More Effective STL) but without the utility functions to > back it up I can see people ignoring this advice. #include <algorithm> has most (if not all) )of the functions you are looking for. find_first_of, find_first_not_of, replace, replace_if, reverse, etc are all there. char phrase_raw[]="C++ is my favorite language"; std::vector<char> phrase_v(phrase_raw,phrase_raw+sizeof(phrase_raw));// includes trailing 0 assert(std::equal(phrase_v.begin(),phrase_v.end(),phrase_raw));// strcmp==0 assert(phrase_v.size()-1==strlen(phrase_raw)); assert(strcmp(&*phrase_v.begin()),phrase_raw)==0); assert(std::distance(std::find(phrase_v.begin(),phrase_v.end(),'g'),phrase_v.begin())==strchr(phrase_raw,'g')- phrase_raw) char srchphrase[]="C++"; assert(std::distance(std::find_first_of(phrase_v.begin(),phrase_v.end() , srchphrase,srchphrase +sizeof(srchphrase)-1),phrase_v.begin())==strcspn(phrase_raw,srchphrase)- phrase_raw) So it is indeed possible, but extremely tedious. > > FWIW, the app is reading in sections of a *huge* XML file. A buffer is > used to hold a fragment which is then parsed using the Xerces SAX > parser (thus it avoids creating a DOM object). Apache SAX is not really a speed demon. There are a number of vendors that have C++ /XML code generators that are far faster. Here is one opoen source version http://www.codesynthesis.com/products/xsd/ Virtually all the use cases of "slow strings" are from Sun WorkShop compiled MT. The Sun String implementation (purchased from RogueWave and modified) used two heap allocations -- one for the guts and one for the actual string data. This maximized binary compatibility when upgrading (which was Sun's intent) but the implementation used one global lock for both the heap allocs AND the string copies (this was a COW implementation). This made somce sense in the pre multiprocessor days, but was a disaster once SMP arrived. In the financial community, this was further exacerbated since there was no small string optimization (and financial data is swamped with little strings). Compared to implementations like STLPort which did have SSO, it looks slow beyond comprehension. I would profile to make sure that indeed vector<char> is really faster than std::string. I just yesterday advised a workmate to consider replacing vector<unsigned> with basic_string<unsigned> for the reason that strings already assume their types have trivial ctors/dtors, and so are not going through the uninitialized fill, looping to call all the dtor's that dont exists,etc. This gives the optimizer much less code that it has to sift throough and judge unneccessary. In fact my profile the other day showed that resizing a vector to the same size repeatedly was a hotspot, simply because the vector implementation had to go three levels deep to figure out that it didnt need to do anything at all, and the optimizer could not inline all that. Lance -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Mathias Gaunard on 27 Feb 2010 02:15
On 26 f�v, 22:29, Andrew <marlow.and...(a)googlemail.com> wrote: > I am working with some legacy code that is in the process of changing > to use std::vector<char> instead of a C-style char array. The C-style > char array is currently allocated using new char [n]. This array is > passed to various C string functions such as strstr, strncmp etc. I > need to do the same work but with a std::vector. You can keep using those functions as long as your vector is null- terminated, since std::vector is contiguous. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |