fast and accurate in mixed mode operations [Fortran]

Prev: a simple overflow example
Next: Overload assignment operator for an user defined type array with ?vector scripts

From: robin on 11 Sep 2009 21:30

"Tim Prince" <tprince(a)nospamcomputer.org> wrote in message news:7gsp7pF2qhnrlU1(a)mid.individual.net...
| Dan Nagle wrote:
| > Hello,
| >
| > On 2009-09-10 06:51:39 -0400, "robin" <robin_v(a)bigpond.com> said:
| >
| >> There's no requirement to do any conversion at compile time.
| >> x/2 can be done at execution time on certain computers
| >> not by division but by scaling (and at considerable saving in time).
| >
| > Masking out the exponent, shifting, subtracting one,
| > and merging the new exponent back into the original number
| > may well take longer than one multiplication
| > on modern hardware.

| Not to mention checking for over/underflow.

A test for underflow has to be performed regardless of whether
actual division or simple halving is performed. That observation
therefore is irrelevant.

| The compiler I learned on
| generated code for /2. which jumped over the subtraction from the
| exponent in the case of a 0. operand but didn't take care of all corner
| cases.

What machine was that?

From: robin on 11 Sep 2009 21:39

"glen herrmannsfeldt" <gah(a)ugcs.caltech.edu> wrote in message news:h8bcte$ld4$1(a)naig.caltech.edu...
| Dan Nagle <dannagle(a)verizon.net> wrote:
| (snip, robin wrote)
|
| <> There's no requirement to do any conversion at compile time.
| <> x/2 can be done at execution time on certain computers
| <> not by division but by scaling (and at considerable saving in time).
|
| < Masking out the exponent, shifting, subtracting one,
| < and merging the new exponent back into the original number
| < may well take longer than one multiplication
| < on modern hardware.
|
| S/360 and sucessors have a floating point HALVE instruction
| to divide by two. While there are no guarantees on the time,
| it should be faster than multiply or divide.

Those HALVE instructions were available from the IBM System 360
RCA Spectra, Fujitsu, and any other 360 look-alikes.
And it was about 20 times faster than division..

| <> 2.0 can be treated as 2 for operations like x*2 and x/2,
| <> and those operations (* or div) are done at run time of course
| <> (the * being performed as x+x, again with considerable increase
| <> in speed).
|
| < On modern hardware, multiply is often (at least almost)
| < as fast as addition.
|
| Especially on values with few one bits. Many compilers are
| good at finding the cases where constant division can be replaced
| by constant multiplication.

It's trivial to do that.
The real question is whether it is as accurate.

| I don't believe that an integer 2 would
| do better than a real 2.0, though.

In most cases, performing x+x is quicker than x*2 (or x*2.0)
because a constant does not have to be dragged from storage.
Adding a register to itself does not require a memory reference.

From: Dan Nagle on 12 Sep 2009 06:17

Hello,

On 2009-09-11 21:27:10 -0400, "robin" <robin_v(a)bigpond.com> said:

> Multipliocation? whio said anything about multiplication?
> The operation is DIVISION here.

Division by a constant, on modern compilers, it is almost always
replaced by multiplication by the reciprocal.

I thought you knew that.
>
> It may interest you to know that those operations take place
> for any arithmeric operation , whether it be +, -, * and divide.
> With halving, division reqires no actual division so that is
> saved. That is why the operation is about 10 times fasyter than
> actual division.

Agreed, on older hardware. On modern pipelined chips,
the repeated operations of masking, anding, and so on,
all require awaiting the results of a previous operation
in order to be inserted into a pipeline.

That takes a long time.
>
> And IF the operation were multiplication by 2, why that can
> be achieved by simple addtion. Again, faster than multiplication.

Not on most modern chips.
>
> | > 2.0 can be treated as 2 for operations like x*2 and x/2,
> | > and those operations (* or div) are done at run time of course
> | > (the * being performed as x+x, again with considerable increase
> | > in speed).
> |
> | On modern hardware, multiply is often (at least almost)
> | as fast as addition.
>
> Often not.

Define the proportion you mean by "often".

--
Cheers!

Dan Nagle

From: nmm1 on 12 Sep 2009 06:45

In article <h8fsfb$ier$1(a)news.eternal-september.org>,
Dan Nagle <dannagle(a)verizon.net> wrote:
>On 2009-09-11 21:27:10 -0400, "robin" <robin_v(a)bigpond.com> said:
>
>> Multipliocation? whio said anything about multiplication?
>> The operation is DIVISION here.
>
>Division by a constant, on modern compilers, it is almost always
>replaced by multiplication by the reciprocal.
>
>I thought you knew that.

Well, only if you enable serious optimisation :-) If you don't, why
are you worrying about performance?

>> It may interest you to know that those operations take place
>> for any arithmeric operation , whether it be +, -, * and divide.
>> With halving, division reqires no actual division so that is
>> saved. That is why the operation is about 10 times fasyter than
>> actual division.
>
>Agreed, on older hardware. On modern pipelined chips,
>the repeated operations of masking, anding, and so on,
>all require awaiting the results of a previous operation
>in order to be inserted into a pipeline.
>
>That takes a long time.

Actually, not even on older hardware. Flipping between the integer
and floating-point pipelines is slow, but often used to be done by
storing to memory and reloading. That is one of the reasons that I
think that the separation of the pipelines is a mistake - but see
comp.arch for where I proposed a heretical solution :-)

>> And IF the operation were multiplication by 2, why that can
>> be achieved by simple addtion. Again, faster than multiplication.
>
>Not on most modern chips.

It can be slower, too. Many modern chips have separate addition
and multiplication floating-point units, or a fused multiply-add,
or both, and the time for of N consecutive additions, N consecutive
multiplications or N consecutive alternations of multiplication
and addition is about the same.

Changing additions into multiplications is nearly as useful an
optimisation as the converse nowadays, though not (in my view) for
sane reasons.

In RARE cases, division can be faster than either multiplication
or addition, for this reason, but it's rare enough to be ignorable.

Regards,
Nick Maclaren.

From: Eli Osherovich on 12 Sep 2009 07:20

On Sep 10, 4:36 am, "sumesh.pt" <sumesh...(a)gmail.com> wrote:
> In mixed mode operations, is it advisable to convert integers to
> double using the command dble() in the expressions or would it be
> better for the compiler to do this job? I am asking this NOT from the
> point of view of understanding the program easily, but in terms of
> speed and accuracy. ie
> real*8 :: x
> integer :: i=10
> x = x/i or x=x/dble(i) which is faster and accurate..
>
> I am doing some kind of recursion calculations with x and hence error
> gets accumulated and I am not able to decide which is best.
>
> Thanks,

Enough has been said about the conversion properties.
May be it is the time to revise your algorithm.
Without knowing the algorithm I cannot say much. However, if you have
an error that grows with the recursion iterations.
A simple change, such as recursion in the backward order can change
the error behavior.

P.S.
For the code snippet you gave, I do not expect the error growth.
Since, the error is divided by 10 every iteration.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: a simple overflow example
Next: Overload assignment operator for an user defined type array with ?vector scripts