From: Peter Olcott on 1 Jun 2010 10:56 On 6/1/2010 9:24 AM, Joseph M. Newcomer wrote: > See below.., > On Tue, 01 Jun 2010 08:38:24 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote: > >> On 5/31/2010 9:10 PM, Joseph M. Newcomer wrote: >>> See below... >>> On Mon, 31 May 2010 12:17:00 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote: >>> >>>> I am not that concerned with performance. The number one priority is >>>> readability and maintainability. Performance is secondary to this. >>> **** >>> But in all your previous posts, you were declaring that the real value of a DFA was that >>> it was the fastest possible implementation. Suddenly, performance no longer matters. >> >> No you merely screwed up yet again and misread what I said. You screw up >> in this way quite often. This is your biggest single mistake. Your >> second biggest mistake is gross over exaggeration. >> >> As one example: You said that nothing can be known about the efficiency >> of a method without testing. I have empirically proven this statement >> completely false. I accurately predicted exactly how much another UTF8 >> decoder would be than my own code using the simple heuristic of >> estimating the number of machine instructions in the execution path. 50% >> more instructions did indeed result in 50% more time. > **** > Funny, I don't recall seeing the experimental results that demonstrated that. Results > computed using QueryPerformanceCounter, showing the mean and standard deviation of a large > number of experiments. > > Oh, and I don't recall you showing the code generated by that example, or giving potential > instruction counts across a distribution of 1, 2, 3 and 4-byte encodings. Or even giving > us data to indicate how many> 1-byte encodings you were handling. You are free to prove > me wrong (you did it once, already, and I accept the data you collected), but without > evidence, I am entitled to offer my opinion about the code, based on my considerable > experience in writing, debugging, measuring, and generating machine code. > > Note: I have been an expert witness. When I am asked my "expert opinion" I am expected to > give an opinion based upon my expertise. Such an opinion cannot be construed as libelous, > because libel implies that I know the statement to be untrue. Look at case law on libel. > But then again, I do have a certificate on Forensic Science and the Law from a local very > important law school, and we had to study tort law as part of that. This means we had to > understand what is meant by "libel" and what is meant by "expert". It would only be > libel if I lied about what my experience had taught me, and said a statement inconsistent > with my experience. My experience tells me that reserve() is a Really Bad Idea if your > purpose is raw performance. That was not my originally stated purpose and you stupidly misconstrued this by not bothering to actually read all of the words that I said. All of your other egregious mistakes logically follow from this one. > My experience tells me that doing error reporting directly in > a utility subroutine is poor design. Therefore, I am offering expert opinion based on my > experience, and consistent with my experience. > **** >> >>> >>> And I still think the design is abysmal from a viewpoint of performance, usability, >>> flexibility, and error reporting. Anyone who chooses to, in a general-purpose subroutine, >>> pop up a MessageBox or do a printf, is simply clueless about how to design good software. >>> **** >> >> I already warned you that such statements are libelous and that you >> should cease and desist such statements. No one else here had indicated >> that the design is less than good. > **** > Well, that might be because no one else spent 15 years doing performance measurement of > programs. I look for little details, like the use of reserve() and push_back(). > > I was not aware that others had to agree with me to make a statement about code quality > "non-libelous". In fact, you have no proof that using reserve() is zero cost, whereas I > know how storage allocators work and know that its cost is always nonzero, and potentially > very large (depending on how malloc is implemented), so making a statement to that effect > is factual, not libelous. > **** >> >>>>> From the bits viewpoint, it is probably correct (I didn't work out all lthe shifts and >>>>> masks, figuring you probably got those right) but from a design and architecture >>>>> viewpoint, it is truly awful code given the stated goals. Besides limiting it to console >>>> >>>> That statement could be libelous. Cease and desist any such commentary. >>> **** >>> What statement? That it is awful code given the stated goals? That can't be libelous. >> >> Do you really want to risk it? > **** > But the facts indicate otherwise. For my statement to be untrue, you would have to > demonstrate that reserve() had zero cost, always. That push_back was faster than > incrementing a target index and storing through it, under all conditions. Have you > demonstrated this? > ****. >> >>>> It is by no means truly awful code, anyone that knows code well would >>>> know this. Prospective future employers might not know code well enough, >>>> and believe what you said. >>> **** >>> Sorry, I would not hire someone who was dumb enough to put a printf in a piece of code >>> that could be used in a Windows environment, or even in a console environment. >> >> Since no one else here thinks that my code is anything like abysmal that >> shows that there is something else going on besides an objective >> assessment of the quality of my code. > **** > OK, show the set of performance measurements, using a high-resolution timer, and showing > how many experiments you ran, and the mean and standard deviation. Perferably for a > variety of input length strings, say 100 characters, 1000 characters and 10,000 > characters. This is how science is done. Telling me that your implementation is the > fastest possible implementation when, in fact, it is potentially abysmal (do you know the > potential cost of that reserve()? What if it takes a single page fault while searching > the heap for a block of the right size?), is not "evidence". > > If you said "Here's an example of code" that's one thing. But promising us for weeks that > you were going to implement something that was the fastest possible realization of an > algorithm and then not actually showing that is inconsistent. Defending your "design" by > a less-than-fastest-possible implementation is not "evidence". > joe > **** > Joseph M. Newcomer [MVP] > email: newcomer(a)flounder.com > Web: http://www.flounder.com > MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Peter Olcott on 1 Jun 2010 11:34 On 6/1/2010 9:34 AM, Joseph M. Newcomer wrote: > See below... > On Mon, 31 May 2010 13:49:07 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote: > >> On 5/31/2010 1:16 PM, Giovanni Dicanio wrote: >>> "Joseph M. Newcomer"<newcomer(a)flounder.com> wrote: >>> >>>>> UTF8.reserve(UTF32.size() * 4); // worst case >>>> **** >>>> Note that this will call malloc(), which will involve setting a lock, >>>> then searching for a >>>> block to allocate, then releasing the lock. Since you have been a >>>> fanatic about >>>> performance, why is it you put a very expensive operation like >>>> 'reserve' in your code? >>>> >>>> While it is perfectly reasonable, it seems inconsistent with your >>>> previously-stated goals. >>> >>> Joe: I'm not sure if you are ironic or something :) ... but I believe >>> that std::vector::reserve() with a proper capacity value, followed by >>> several push_back()s, is very efficient. >>> Sure, not as efficient as a static stack-allocated array, but very >>> efficient. >> >> He needed to find some excuse to denigrate my code. He has had a >> personal grudge against me for several months. I don't really know what >> I said to offend him, but, it must have occurred sometime after he sung >> very high praises about my patent a few months ago. > *** > I do not have a "personal grudge against you"; what I dislike are people who are > pretentious, who make statements they can't back up, and present code that is inconsistent > with their loudly-touted goals and try to make claims that it is the best possible code > when it is not. > > I defended you against what I thought was an *unfair* accusation, that of being a Patent > Troll. If there are unjust accusations, I will object. But when you batter us to > insensibility about how critical performance is, and talk about presenting the "fastest > possible design", then I am equally offended; designs cannot be executed and therefore > cannot have speed. Code has measurable performance. And the code presented was bad code, Sure it can and indeed it does. Many designs are inherently substantially faster than specific alternatives. Your black and white all or none thinking indicates a perspective that is out of balance. A design based on the query of a specific customer using customer number within a very large database using a linear search is obviously very much slower that a design based on using a B+ tree index. There are countless other examples. > for all the reasons I stated. It has nothing to do with a personal grudge; it has > entirely to do with the fact that you state one thing, then present as evidence of your > correctness something which contradicts your own statement. This is not consistent. > Therefore, it is a target of opportunity to point out that you are not making sense. I > also have to judge code for its correctness not just in the core algorithm, but in the > overall implementation; utility code which uses printf or which even interacts with the > user is not correct code, because it either will not work at all or will produce > meaningless output to the user, and neither of these represent an acceptable design. > > If you make sense, I will defend you. If you prove me wrong with actual numbers, I will > accept your numbers and agree that you are actually right. I did once before. But if you > offer opnions on the performance of artficats that are measurable (code, not designs), > without the data to back them, then you are not making sense, and you need to be told > this. > joe If you measure my code against the incorrect standard that it is specifically encoded to be the fastest possible encoding, even then it is not abysmal. All of the performance improvements that you suggested don't result in as much as a doubling in speed. http://www.ocr4screen.com/UTF8.cpp From benchmarking my code against the code that Hector posted a link to http://bjoern.hoehrmann.de:80/utf-8/decoder/dfa/ This other code was only 37% faster. The specific test was to generate 100 instances of every codepoint (skipping the 0x800-0xDFFF range) and then decode these 100 instances. The instances were generated with the code posted in this thread. All memory was allocated in advance so that only the decode speed would be measured. You are certainly smart and educated enough to be able to estimate these results in advance. To call code abysmal merely because it takes 50% more time is certainly not an objective assessment of the actual code quality. If the code took 50-fold more time and the design goal was maximum performance, then this would surely be abysmal. Since the design goal was not to produce the fastest possible encoding and the speed difference is only 50%, an "abysmal" assessment of code quality is clearly dishonest. > . > **** >> >>>> No, the CORRECT way to write such code is to either throw an exception >>>> (if you are in C++, >>>> which you clearly are) or return a value indicating the error (for >>>> example, in C, an >>> >> >> The "correct" way to handle an error when testing code for the first >> time is to use a printf() statement, or other easy to use debugging >> construct. When the code moves to production, then either of the other >> two suggestions may be appropriate. >> >>> In this case, I'm for exception. >>> Thanks to exception, you could use the precious function return value to >>> actually return the resulting buffer (UTF8 string), instead of passing >>> it as a reference to the function: >>> >>> // Updated prototype: >>> // - use 'const' correctness for utf32 >>> // - return resulting utf8 >>> // - may throw on error >>> std::vector<uint8_t> toUTF8(const std::vector<uint32_t> & utf32); >> >> For most compilers this requires making an extra copy. >> >>> >>> Note that thanks to the move semantics (i.e. the new "&&" thing of >>> C++0x, available in VC10 a.k.a. VS2010), you don't pay for extra useless >>> copies in returning potentially big objects. >>> >>> Giovanni >>> >>> >>> >> Counting on this results in code that does not have the same performance >> characteristics across multiple platforms. > Joseph M. Newcomer [MVP] > email: newcomer(a)flounder.com > Web: http://www.flounder.com > MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on 1 Jun 2010 13:35 see below... On Mon, 31 May 2010 14:54:48 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote: >On 5/31/2010 2:35 PM, Giovanni Dicanio wrote: >> "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote: >> >>> He needed to find some excuse to denigrate my code. He has had a >>> personal grudge against me for several months. I don't really know >>> what I said to offend him, but, it must have occurred sometime after >>> he sung very high praises about my patent a few months ago. >> >> I don't think so. >> >> Joe helps lots of people here (and is a nice guy in person!). >> >> You must have misunderstood. > >No I really don't think so. He helps lots of people, and other than his >disdain for me may be a really nice guy. He is certainty not speaking >accurately about the quality of my developmental code. The degree of >this inaccuracy indicates a strong negative bias against me. **** You did not present it as "developmental code" nor state that it was other than a finished product. Instead, you have repeatedly insisted that any alternative I suggested could not possibly be as fast as the code you write, then present an example of code that is certainly not as fast as it could be. I find the contrast bizarre. I have a strong negative bias to people who claim X, and when they deliver < X, claim it is actually X they are delivering. In other contexts, we call them "politicians" and refer to "campaign promises". I have little respect for people who claim they are perfect, and deliver as proof of their perfection code which is less than perfect by most reasonable metrics, such as performance, utility, flexibility, etc. I guess the new form of logic is "I am perfect. The proof that I am perfect is that anyone who questions my perfection will be threatened with a lawsuit". In other contexts we refer to this using terms like "religion" and "heresy" and "inquisitition". **** > >Other people here have picked apart several aspects of his negative >assessment and thus sided against this negative assessment. **** No, Giovanni pointed out that with the use of reserve(), push_back() is faster than if you don't use reserve(), which is absolutely true and self-evident. But I was objecting to the promises of "fastest possible" embodied by code that is *not* fastest-possible. And while push_back() *is* faster with reserve(), it is not as fast as ++dest to increment a destination pointer, therefore it is not as fast as possible. So the objections to my observation did not change the validity of the observation. Only the magnitude. But you had made promises of absolutes. And the code is inconsistent with that. And you still talk about "fast designs" as if this made sense. Only running code can be measured, and only running code which has been scientifically measured is a valid confirmation of performance. **** > >When viewed within the specific context that the sole purpose of this >code is to validate the correctness of the algorithm the code is >objectively at the very least very good quality. **** I was interpreting "correctness" to mean correctness of the code. If I were to use that code in a Windows app it would not be correct. If your goal was to produce the fastest possible code with maximum utility, that code is not it. But why is it OK for you to claim your code is always the fastest possible, then produce an example that violates your own claim, and why is it libelous of me to point this out? Or are we back to proof-by-intimidation? (Maybe I missed that particular logical methodology in my several courses in mathematical logic) joe **** > >> >> >>>> std::vector<uint8_t> toUTF8(const std::vector<uint32_t> & utf32); >>> >>> For most compilers this requires making an extra copy. >> >> Before move semantics, I think several C++ compilers implemented the RVO. >> >> Giovanni >> Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on 1 Jun 2010 13:50 On Mon, 31 May 2010 15:08:12 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote: >On 5/31/2010 2:41 PM, Daniel T. wrote: >> Peter Olcott<NoSpam(a)OCR4Screen.com> wrote: >>> On 5/31/2010 1:24 PM, Daniel T. wrote: >>>> Peter Olcott<NoSpam(a)OCR4Screen.com> wrote: >>>>> On 5/31/2010 11:35 AM, Daniel T. wrote: >>>> >>>>>> The codes 10FFFE and 10FFFF are guaranteed not to be unicode >>>>>> characters... >>>>> >>>>> So then Wikipedia is wrong? >>>>> http://en.wikipedia.org/wiki/Unicode **** Wkipedia means whatever the author of the article thought it meant. There is no guarantee that the author actually interpreted information correctly. There are primary sources of information (e.g., standards) and secondary sources (discussions about the standards), and then there are tertiary sources which are someone's interpretation of what the standard might have meant. I'm not sure of the reliability of any bu the published standard, and note that we are on revision 5.0 of that standard, meaning the last four had something wrong with them. ***** >>>>> 16 100000�10FFFF Supplementary Private Use Area-B >>>> >>>> According to unicode.org, apparently yes. You'd know that if you >>>> hadn't been lazy and only consulted a secondary source. >>> >>> I simply don't have the time to read all of the Unicode stuff to find >>> the two or three paragraphs that I really need to know. I already know >>> about High and Low surrogates. Why is the range that you specified not >>> valid codepoints? >> >> It took me less than two minutes searching unicode.org to answer the >> question you are asking me. I suggest you make the attempt at least. > > http://unicode.org/charts/PDF/U100000.pdf > >The Private Use Area does not contain any character assignments, >consequently no character code charts or >namelists are provided for this area. However, the two code locations at >the end of each plane are designated >non-characters. > >It is almost as if the biblical story of the tower of babble was >literally true, and human language (including Unicode) was deliberately >made much more complex than necessary. **** Actually, the tower was called "Babel", and the word "babble" is probably derived from that name. Actually, it was not called Babel, but I cannot read Hebrew, and therefore cannot read the name in its original language. "Babel" is actually the name used in the English translation of the Bible. An English translation of the Bible is at best a secondary source. Eight years of studying the Bible (1959-1967) and courses in Bible history (that is, the history of the document including the derivation of modern print translations in a variety of languages) seem to have had some effect. It is truly amazing what odd knowledge I find I still retain after all these years. One thing is certain: all modern translations have varying degrees of departure from the original texts. Modern translations are anything other than the original Hebrew for the Old Testament, and other than the original Aramaic or Greek for the New Testament. I read none of these languages, but studied under scholars who could, and who debated fine points of translation with others of their skill. And told us about these debates in class. joe **** > >> >>>>> So it looks otherwise correct? >>>> >>>> Does it pass all your tests? You do have tests don't you? >>> >>> I am using the results of this function to mutually exhaustively test >>> the results of another function that does the conversion in the other >>> direction. These tests pass. >> >> Then why are you asking us if the algorithm is correct? Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on 1 Jun 2010 14:04
See below... On Tue, 01 Jun 2010 10:34:40 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote: >On 6/1/2010 9:34 AM, Joseph M. Newcomer wrote: >> See below... >> On Mon, 31 May 2010 13:49:07 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote: >> >>> On 5/31/2010 1:16 PM, Giovanni Dicanio wrote: >>>> "Joseph M. Newcomer"<newcomer(a)flounder.com> wrote: >>>> >>>>>> UTF8.reserve(UTF32.size() * 4); // worst case >>>>> **** >>>>> Note that this will call malloc(), which will involve setting a lock, >>>>> then searching for a >>>>> block to allocate, then releasing the lock. Since you have been a >>>>> fanatic about >>>>> performance, why is it you put a very expensive operation like >>>>> 'reserve' in your code? >>>>> >>>>> While it is perfectly reasonable, it seems inconsistent with your >>>>> previously-stated goals. >>>> >>>> Joe: I'm not sure if you are ironic or something :) ... but I believe >>>> that std::vector::reserve() with a proper capacity value, followed by >>>> several push_back()s, is very efficient. >>>> Sure, not as efficient as a static stack-allocated array, but very >>>> efficient. >>> >>> He needed to find some excuse to denigrate my code. He has had a >>> personal grudge against me for several months. I don't really know what >>> I said to offend him, but, it must have occurred sometime after he sung >>> very high praises about my patent a few months ago. >> *** >> I do not have a "personal grudge against you"; what I dislike are people who are >> pretentious, who make statements they can't back up, and present code that is inconsistent >> with their loudly-touted goals and try to make claims that it is the best possible code >> when it is not. >> >> I defended you against what I thought was an *unfair* accusation, that of being a Patent >> Troll. If there are unjust accusations, I will object. But when you batter us to >> insensibility about how critical performance is, and talk about presenting the "fastest >> possible design", then I am equally offended; designs cannot be executed and therefore >> cannot have speed. Code has measurable performance. And the code presented was bad code, > >Sure it can and indeed it does. Many designs are inherently >substantially faster than specific alternatives. Your black and white >all or none thinking indicates a perspective that is out of balance. > >A design based on the query of a specific customer using customer number >within a very large database using a linear search is obviously very >much slower that a design based on using a B+ tree index. There are >countless other examples. *** No, a "design" is not executable. A choice to use a B-tree or a linear search can only be measured in terms of actual code. A design would state that there was a way t map a customer number to a record. An implementation decides whether or not a B-tree is used. **** > >> for all the reasons I stated. It has nothing to do with a personal grudge; it has >> entirely to do with the fact that you state one thing, then present as evidence of your >> correctness something which contradicts your own statement. This is not consistent. >> Therefore, it is a target of opportunity to point out that you are not making sense. I >> also have to judge code for its correctness not just in the core algorithm, but in the >> overall implementation; utility code which uses printf or which even interacts with the >> user is not correct code, because it either will not work at all or will produce >> meaningless output to the user, and neither of these represent an acceptable design. >> >> If you make sense, I will defend you. If you prove me wrong with actual numbers, I will >> accept your numbers and agree that you are actually right. I did once before. But if you >> offer opnions on the performance of artficats that are measurable (code, not designs), >> without the data to back them, then you are not making sense, and you need to be told >> this. >> joe > >If you measure my code against the incorrect standard that it is >specifically encoded to be the fastest possible encoding, even then it >is not abysmal. All of the performance improvements that you suggested >don't result in as much as a doubling in speed. > http://www.ocr4screen.com/UTF8.cpp > > From benchmarking my code against the code that Hector posted a link to > http://bjoern.hoehrmann.de:80/utf-8/decoder/dfa/ >This other code was only 37% faster. ***** "Only" 37% faster? Actually 37% is a pretty big number in terms of performance! Most attempts to "improve" performance are lucky if they get single-digit percentage improvement. As someone who spent a nontrivial amount of his life worrying about these issues, I can say that 37% is a SUBSTANTIAL performance improvement! And if it were 1% faster, it would still prove your code was not the fastest possible. But 37%? You aren't even in the running in this contest! joe **** > >The specific test was to generate 100 instances of every codepoint >(skipping the 0x800-0xDFFF range) and then decode these 100 instances. >The instances were generated with the code posted in this thread. All >memory was allocated in advance so that only the decode speed would be >measured. > >You are certainly smart and educated enough to be able to estimate these >results in advance. To call code abysmal merely because it takes 50% >more time is certainly not an objective assessment of the actual code >quality. **** 50% more time? Wow! I'd call that "abysmal". Now if it were only 3% slower, I would have been guilty of overexaggeration. By objective measure, 50% more time is REALLY BAD! **** > >If the code took 50-fold more time and the design goal was maximum >performance, then this would surely be abysmal. Since the design goal >was not to produce the fastest possible encoding and the speed >difference is only 50%, an "abysmal" assessment of code quality is >clearly dishonest. **** No, if it was more than an order of magnitude slower, it would be laughably slower. I guess we are discussing the meaning of "abysmal". By my standards, of code performance, 50% more time is "abysmal". That is not dishonest. We used to think a 10% improvement was substantial. But then, we were all highly-experience programmers (more than half of the project team had PhDs), so we knew what to expect. joe **** > > >> . >> **** >>> >>>>> No, the CORRECT way to write such code is to either throw an exception >>>>> (if you are in C++, >>>>> which you clearly are) or return a value indicating the error (for >>>>> example, in C, an >>>> >>> >>> The "correct" way to handle an error when testing code for the first >>> time is to use a printf() statement, or other easy to use debugging >>> construct. When the code moves to production, then either of the other >>> two suggestions may be appropriate. >>> >>>> In this case, I'm for exception. >>>> Thanks to exception, you could use the precious function return value to >>>> actually return the resulting buffer (UTF8 string), instead of passing >>>> it as a reference to the function: >>>> >>>> // Updated prototype: >>>> // - use 'const' correctness for utf32 >>>> // - return resulting utf8 >>>> // - may throw on error >>>> std::vector<uint8_t> toUTF8(const std::vector<uint32_t> & utf32); >>> >>> For most compilers this requires making an extra copy. >>> >>>> >>>> Note that thanks to the move semantics (i.e. the new "&&" thing of >>>> C++0x, available in VC10 a.k.a. VS2010), you don't pay for extra useless >>>> copies in returning potentially big objects. >>>> >>>> Giovanni >>>> >>>> >>>> >>> Counting on this results in code that does not have the same performance >>> characteristics across multiple platforms. >> Joseph M. Newcomer [MVP] >> email: newcomer(a)flounder.com >> Web: http://www.flounder.com >> MVP Tips: http://www.flounder.com/mvp_tips.htm Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm |