Prev: How to reuse an existing stream with a different streambuf ?
Next: A quick question about Initializer-List in C++0x
From: Peter Olcott on 7 Jun 2010 18:54 I am aiming to produce the best balance of speed and space efficiency against reliability and maintainability placing a higher weight on the latter criteria. http://www.ocr4screen.com/UTF8.h If there are any improvements that can be made within the criteria provided, or other improvements that do not detract from the above criteria input would be welcomed. This support file is required for compiling. Please do not critique the support file it is not in final form. http://www.ocr4screen.com/Array2D.h -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Nick Hounsome on 8 Jun 2010 02:44 On 8 June, 10:54, Peter Olcott <NoS...(a)OCR4Screen.com> wrote: > I am aiming to produce the best balance of speed and space efficiency > against reliability and maintainability placing a higher weight on the > latter criteria. > http://www.ocr4screen.com/UTF8.h > > If there are any improvements that can be made within the criteria > provided, or other improvements that do not detract from the above > criteria input would be welcomed. 1) uint8_t and uint32_t are already standard types so why are you typedefing your own? 2) Precalculate the state table - It is logically a static const State[][]. 3) Eliminate all the global stuff - It's not comp.lang.c here. 4) Your class is logically stateless hence methods (as well as the state array) should be probably be static 5) It is highly unlikely that these methods would be inlined by the compiler (inline is a hint not an order) and if they were it could be a space wasting disaster. Make them non-inline - The call overhead will be dwarfed by the processing in almost all cases. 6) Use vector::reserve so that push_back wont do multiple reallocations/copies 7) Template the methods and use iterators rather than collections - it's more flexible (you can keep the current signatures as overloads) 8) Why are you using uint8_t as a State instead of an enum??????? NextState = 0 !!!???? Same goes for ActionCode. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Goran on 8 Jun 2010 02:43 On Jun 8, 11:54 am, Peter Olcott <NoS...(a)OCR4Screen.com> wrote: > I am aiming to produce the best balance of speed and space efficiency > against reliability and maintainability placing a higher weight on the > latter criteria. > http://www.ocr4screen.com/UTF8.h A better solution to your problem seems to exist since some time. http://utfcpp.sourceforge.net/ Goran. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Peter Olcott on 8 Jun 2010 06:03 On 6/8/2010 12:44 PM, Nick Hounsome wrote: > On 8 June, 10:54, Peter Olcott<NoS...(a)OCR4Screen.com> wrote: >> I am aiming to produce the best balance of speed and space efficiency >> against reliability and maintainability placing a higher weight on the >> latter criteria. >> http://www.ocr4screen.com/UTF8.h >> >> If there are any improvements that can be made within the criteria >> provided, or other improvements that do not detract from the above >> criteria input would be welcomed. > > 1) uint8_t and uint32_t are already standard types so why are you > typedefing your own? Because they are not defined on my platform. MS VS 2008. > 2) Precalculate the state table - It is logically a static const > State[][]. I did it this way so that it would be small enough to fit on the stack, and also so that it could be included as a header file. > 3) Eliminate all the global stuff - It's not comp.lang.c here. I don't know what you mean, everything is a member of a class. > 4) Your class is logically stateless hence methods (as well as the > state array) should be probably be static Maybe, what is the advantage? > 5) It is highly unlikely that these methods would be inlined by the > compiler (inline is a hint not an order) and if they were it could be > a space wasting disaster. Make them non-inline - The call overhead > will be dwarfed by the processing in almost all cases. The compiler requires inlining if the methods are included in the header. I hate creating the typically required pair of files it seems so redundant. > 6) Use vector::reserve so that push_back wont do multiple > reallocations/copies Yes. I had initially allocated the worst case memory, then I thought that this wasted memory. When I benchmarked it on 100 instances of the entire Unicode set, it only took about 30% more time, but, 50% less memory. It might be a good idea to at least reserve() the best case memory requirement. > 7) Template the methods and use iterators rather than collections - > it's more flexible (you can keep the current signatures as overloads) > 8) Why are you using uint8_t as a State instead of an enum??????? > NextState = 0 !!!???? Same goes for ActionCode. > > Yes this is the best suggestion. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Kenneth 'Bessarion' Boyd on 8 Jun 2010 06:04
On Jun 8, 4:54 am, Peter Olcott <NoS...(a)OCR4Screen.com> wrote: > I am aiming to produce the best balance of speed and space efficiency > against reliability and maintainability placing a higher weight on the > latter criteria. > http://www.ocr4screen.com/UTF8.h > > If there are any improvements that can be made within the criteria > provided, or other improvements that do not detract from the above > criteria input would be welcomed. Ignoring the existence of other libraries: 1) Include-guard #ifndef, please? Especially since this is intended to be a header-only implementation? (There is no other naively obvious reason for not using a *.cpp file to avoid needing inline function specifiers to get past the linker.) 2) Encoding should use NextByteMask as well. 3) While the action codes should be stored as uint8_t (more generally unsigned char) for memory efficiency reasons, it would be better not to gamble on the compiler's optimizer: this should be an enumeration. 4) uint8_t and uint32_t collide with #include <stdint.h> (and possibly #include <cstdint>) if that particular C++0X support is present. If you have stdint.h, I think uint_least8_t and uint_least32_t are better for portability to exotic hardware. In any case, have some sort of project configuration check for stdint.h and use it if present. 4a) As only analytically non-conformant hardware would force CHAR_BIT<8 , I'd consider just using unsigned char in place of uint8_t/ uint_least8_t. 4b) Preprocessor detection (using limits.h) of a suitable integer type for uint_least32_t is painful; my attempts to hand-roll this have been unconvincing. 5) The bootstrapping of State is a speed/maintainability trade off. Speed would expect this to be a static const uint8_t State[9][256], but * avoiding linking errors from initializing this requires either templating, or a *.cpp file to hold the initialization. Assuming header-only implementation leaves templating. * Debug-mode build would have to fully check (e.g., assert macros) that the table is correctly initialized anyway. * the proofreading behind both the static initialization, and the debug-mode checking, is awful and would strongly bias towards automatic code generation -- which simply replicates the existing constructor code. 5a) I'm not sold on a state table for this, either. The comment describing the table is perfectly good pseudocode for a static member function, and I have a hard time seeing that function checking in at more than 1152 (9*128, easiest simple compressed format viable) bytes object size. 5b) As maintainability is important, forget about memset/std::memset as a replacement for the loops. If you actually *have* a speed problem you should reimplement State as one of static data, or static member function. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |