Prev: MAKE UPTO $5000 MONTHLY! $2000 INYOUR FIRST 30 DAYS!
Next: help: enum cause failure in multi thread run
From: Lew on 14 Jul 2010 19:44 Sanny, you should attribute quotes. Patricia Shanahan wrote: >> Have you considered a lookup-table? It would have 65536 elements, but >> most of the accesses would probably be concentrated in a few areas, so >> it would be quite cache-friendly. You could map to e.g. 1 for "-" and 0 >> for anything else, then do arithmetic on the lookup result. Sanny wrote: > Problem Solved. I tried a lookup table and now all if conditions have > been removed. > > Now its just a piece of addition. Which quite fast 3-4 times faster > than the if conditions. I'm curious, Sanny, why you have not answered the several people who asked how you obtained these speed conclusions. I, for one, was on tenterhooks for the answers, and only forbore from echoing the question before now because I felt so sure that after four or five posts of the same question you would actually deign to answer it. Alas to be so disappointed. And we're reasonably clued in enough to guess that you ran a benchmark. The answer needs to be rather white-box: how do you distinguish the time in "if" evaluation from that in other areas of code? What were the hard numbers for the loop and unrolled version? What Hotspot warmup protocol did you use? Did you exclude JVM startup time? 3-4 times faster than what time, exactly? Microbenchmarks are notoriously unreliable estimators of real-world performance, especially for Java. Things like running benchmarks under different load conditions or different threading scenarios over different numbers of CPUs from a production version are infamous legends in the world of performance engineering. I for one am deeply dubious of your benchmark claims, vague as they are and unaccompanied by a detailed description of your testing rigors, much less actual code. Please provide the requisite fill for the gaps in our knowledge. -- Lew
From: Patricia Shanahan on 14 Jul 2010 21:32 Daniel Pitts wrote: > On 7/14/2010 2:11 PM, Sanny wrote: >> >>> Have you considered a lookup-table? It would have 65536 elements, but >>> most of the accesses would probably be concentrated in a few areas, so >>> it would be quite cache-friendly. You could map to e.g. 1 for "-" and 0 >>> for anything else, then do arithmetic on the lookup result. >> >> Problem Solved. I tried a lookup table and now all if conditions have >> been removed. >> >> Now its just a piece of addition. Which quite fast 3-4 times faster >> than the if conditions. > I doubt it. > A lookup table requires a dereference and a multiplication. > > I *really* think you need to make sure you're not complicating a simple > piece of code for little actual gain. Measure twice, cut once. Obviously, I thought the lookup table had a good enough chance of improving performance to be worth measuring, so I am not particularly surprised. Modern processors have such good pipelining, superscalar, and out of order execution capabilities that arithmetic operations are surprisingly cheap. There are two things that really take time, cache misses and branch mis-predictions. The branches on the charAt results were likely to be particularly expensive because there is no detectable pattern to them, unless the data contains a repeating pattern. The best the branch predictor can do is probably to predict not '-', and take a pipeline flush for each '-'. I am surprised there was much gain from the loop unrolling - the conditional branch in a for-loop is something branch predictors are designed to handle well. The many calls to charAt might interfere with in-lining. On the other hand, the array access is likely to hit in cache. Unicode characters come in blocks that tend to be used together. The first 128 elements of the array are the only ones used for ASCII data. I don't believe in microbenchmarking for this sort of thing, as distinct from running the real workload for each implementation, because cache and branch predictor behavior is affected by the rest of the job. Patricia
From: Sanny on 15 Jul 2010 08:44 > You have revealed nothing of your benchmark. The snippets you show in your > response do not do what your original problem statement specified. You don't > show your measurement techniques. You don't even explain your testing rigor, > much less share the benchmark code. > > What exactly should I try? Just try the above for loops and you will see how much time spent in both cases. You have to put extra efforts to create a timer. I tested these an year back. and found if conditions are taking a lot of time than other arithmetic problems. Bye Sanny
From: Lew on 15 Jul 2010 09:07 On Jul 15, 8:44 am, Sanny <softtank...(a)hotmail.com> wrote: > > You have revealed nothing of your benchmark. The snippets you show in your > > response do not do what your original problem statement specified. You don't > > show your measurement techniques. You don't even explain your testing rigor, > > much less share the benchmark code. > > > What exactly should I try? > > Just try the above for loops and you will see how much time spent in > both cases. > > You have to put extra efforts to create a timer. > > I tested these an year back. and found if conditions are taking a lot > of time than other arithmetic problems. > You really have a gift for avoiding the question. Why should I put in the extra "efforts" that you won't deliver? I ask again, where are your timing loops? Where is your code? What is your rigor? I'm not going to waste my time performing a useless microbenchmark to answer a question you refuse to answer, knowing that my way of doing it will be more correct and produce different results from yours. Of course the new example you post will run quicker without 'if' evaluations, but that says less than nothing about the very different scenario that you first presented, where some form of conditional logic was needed, nor does it reveal the actual numbers from your test runs, nor does it reveal how much the 13 ms you saved with all your complicated micro-optimization counts against the 30 minutes your program waits for input. I suppose I can safely conclude that you will continue to refuse to answer. I shall return the favor. -- Lew
From: RedGrittyBrick on 15 Jul 2010 12:00
On 15/07/2010 14:46, Patricia Shanahan wrote: > it is helpful to make your code as ordinary as possible. That's always been my rule of thumb. Assume compiler writers and chip designers are smarter than I am and write the sort of code I would expect them to expect. -- RGB - slower than a bullet, less powerful than an express train ... |