From: Lew on
Sanny, you should attribute quotes.

Patricia Shanahan wrote:
>> Have you considered a lookup-table? It would have 65536 elements, but
>> most of the accesses would probably be concentrated in a few areas, so
>> it would be quite cache-friendly. You could map to e.g. 1 for "-" and 0
>> for anything else, then do arithmetic on the lookup result.

Sanny wrote:
> Problem Solved. I tried a lookup table and now all if conditions have
> been removed.
>
> Now its just a piece of addition. Which quite fast 3-4 times faster
> than the if conditions.

I'm curious, Sanny, why you have not answered the several people who asked how
you obtained these speed conclusions. I, for one, was on tenterhooks for the
answers, and only forbore from echoing the question before now because I felt
so sure that after four or five posts of the same question you would actually
deign to answer it. Alas to be so disappointed.

And we're reasonably clued in enough to guess that you ran a benchmark. The
answer needs to be rather white-box: how do you distinguish the time in "if"
evaluation from that in other areas of code? What were the hard numbers for
the loop and unrolled version? What Hotspot warmup protocol did you use? Did
you exclude JVM startup time? 3-4 times faster than what time, exactly?

Microbenchmarks are notoriously unreliable estimators of real-world
performance, especially for Java. Things like running benchmarks under
different load conditions or different threading scenarios over different
numbers of CPUs from a production version are infamous legends in the world of
performance engineering. I for one am deeply dubious of your benchmark
claims, vague as they are and unaccompanied by a detailed description of your
testing rigors, much less actual code.

Please provide the requisite fill for the gaps in our knowledge.

--
Lew
From: Patricia Shanahan on
Daniel Pitts wrote:
> On 7/14/2010 2:11 PM, Sanny wrote:
>>
>>> Have you considered a lookup-table? It would have 65536 elements, but
>>> most of the accesses would probably be concentrated in a few areas, so
>>> it would be quite cache-friendly. You could map to e.g. 1 for "-" and 0
>>> for anything else, then do arithmetic on the lookup result.
>>
>> Problem Solved. I tried a lookup table and now all if conditions have
>> been removed.
>>
>> Now its just a piece of addition. Which quite fast 3-4 times faster
>> than the if conditions.
> I doubt it.
> A lookup table requires a dereference and a multiplication.
>
> I *really* think you need to make sure you're not complicating a simple
> piece of code for little actual gain. Measure twice, cut once.

Obviously, I thought the lookup table had a good enough chance of
improving performance to be worth measuring, so I am not particularly
surprised.

Modern processors have such good pipelining, superscalar, and out of
order execution capabilities that arithmetic operations are surprisingly
cheap. There are two things that really take time, cache misses and
branch mis-predictions.

The branches on the charAt results were likely to be particularly
expensive because there is no detectable pattern to them, unless the
data contains a repeating pattern. The best the branch predictor can do
is probably to predict not '-', and take a pipeline flush for each '-'.
I am surprised there was much gain from the loop unrolling - the
conditional branch in a for-loop is something branch predictors are
designed to handle well. The many calls to charAt might interfere with
in-lining.

On the other hand, the array access is likely to hit in cache. Unicode
characters come in blocks that tend to be used together. The first 128
elements of the array are the only ones used for ASCII data.

I don't believe in microbenchmarking for this sort of thing, as distinct
from running the real workload for each implementation, because cache
and branch predictor behavior is affected by the rest of the job.

Patricia
From: Sanny on
> You have revealed nothing of your benchmark. The snippets you show in your
> response do not do what your original problem statement specified.  You don't
> show your measurement techniques.  You don't even explain your testing rigor,
> much less share the benchmark code.
>
> What exactly should I try?

Just try the above for loops and you will see how much time spent in
both cases.

You have to put extra efforts to create a timer.

I tested these an year back. and found if conditions are taking a lot
of time than other arithmetic problems.

Bye
Sanny


From: Lew on
On Jul 15, 8:44 am, Sanny <softtank...(a)hotmail.com> wrote:
> > You have revealed nothing of your benchmark. The snippets you show in your
> > response do not do what your original problem statement specified.  You don't
> > show your measurement techniques.  You don't even explain your testing rigor,
> > much less share the benchmark code.
>
> > What exactly should I try?
>
> Just try the above for loops and you will see how much time spent in
> both cases.
>
> You have to put extra efforts to create a timer.
>
> I tested these an year back. and found if conditions are taking a lot
> of time than other arithmetic problems.
>

You really have a gift for avoiding the question.

Why should I put in the extra "efforts" that you won't deliver? I ask
again, where are your timing loops? Where is your code? What is your
rigor?

I'm not going to waste my time performing a useless microbenchmark to
answer a question you refuse to answer, knowing that my way of doing
it will be more correct and produce different results from yours.

Of course the new example you post will run quicker without 'if'
evaluations, but that says less than nothing about the very different
scenario that you first presented, where some form of conditional
logic was needed, nor does it reveal the actual numbers from your test
runs, nor does it reveal how much the 13 ms you saved with all your
complicated micro-optimization counts against the 30 minutes your
program waits for input.

I suppose I can safely conclude that you will continue to refuse to
answer. I shall return the favor.

--
Lew

From: RedGrittyBrick on
On 15/07/2010 14:46, Patricia Shanahan wrote:
> it is helpful to make your code as ordinary as possible.

That's always been my rule of thumb. Assume compiler writers and chip
designers are smarter than I am and write the sort of code I would
expect them to expect.

--
RGB - slower than a bullet, less powerful than an express train ...