fast and accurate in mixed mode operations [Fortran]

Prev: a simple overflow example
Next: Overload assignment operator for an user defined type array with ?vector scripts

From: robin on 12 Sep 2009 21:52

"Tim Prince" <tprince(a)nospamcomputer.org> wrote in message news:4AABA85B.6000703(a)nospamcomputer.org...
| robin wrote:
| > "Tim Prince" <tprince(a)nospamcomputer.org> wrote in message news:7gsp7pF2qhnrlU1(a)mid.individual.net...
| > | Dan Nagle wrote:
| > | > Hello,
| > | >
| > | > On 2009-09-10 06:51:39 -0400, "robin" <robin_v(a)bigpond.com> said:
| > | >
| > | >> There's no requirement to do any conversion at compile time.
| > | >> x/2 can be done at execution time on certain computers
| > | >> not by division but by scaling (and at considerable saving in time).
| > | >
| > | > Masking out the exponent, shifting, subtracting one,
| > | > and merging the new exponent back into the original number
| > | > may well take longer than one multiplication
| > | > on modern hardware.
| >
| > | Not to mention checking for over/underflow.
| >
| > A test for underflow has to be performed regardless of whether
| > actual division or simple halving is performed. That observation
| > therefore is irrelevant.

| A divide operation takes care of possible underflow.

and overflow.
That's because a test is included.

| Subtracting from
| the exponent field without checking against limits may wrap the result
| around past HUGE.

A check is required, of course, as is required for any arith operation.

| Since formats such as infinities and sub-normals were
| introduced (over 25 years ago), more cases of failure for such shortcuts
| have been present.

where?

| > | The compiler I learned on
| > | generated code for /2. which jumped over the subtraction from the
| > | exponent in the case of a 0. operand but didn't take care of all corner
| > | cases.
| >
| > What machine was that?

| On the Honeywell 6000 series, x/2. would not fail, but x/4. would
| produce HUGE(x) if x==TINY(x). Of course, there were no TINY or HUGE;
| they gave up after a partial implementation of f77.

You seem to have contradicted yourself.
Previously you said that x/2 didn't take care of corner cases.
Now you say that it did not fail.

If it was a hardware instruction for /4, it was the /4 instruction wthat was flawed.

From: glen herrmannsfeldt on 13 Sep 2009 00:16

robin <robin_v(a)bigpond.com> wrote:
< "Tim Prince" <tprince(a)nospamcomputer.org> wrote in message news:4AABA85B.6000703(a)nospamcomputer.org...
(snip)

< | Subtracting from the exponent field without checking against
< | limits may wrap the result around past HUGE.

< A check is required, of course, as is required for any arith operation.

Fortran doesn't require it for fixed point. Most implementations
that I know of don't test for it. I don't believe it is required
for floating point, though many do trap for that.

It is nice for the processor to set to zero on underflow, and a
large value on overflow, but they don't all do that, either.

-- glen

From: Dan Nagle on 13 Sep 2009 11:05

Hello,

On 2009-09-12 21:52:18 -0400, "robin" <robin_v(a)bigpond.com> said:

> The operations required for halving are no different from ordinary
> division, except that no division is required. Since there is no
> divivion, that time is saved.

As near as I can estimate, for published (not by Intel)
timings of a typical x86 chip, the time to multiply
by the precomputed reciprocal of a constant is 6 clocks,
using the SSE floating multiply.

The time to make the shift-and-add of the exponent
is about 13 clocks.

Must we now read where robin claims that x86s
are not "often" used, or that 13 clocks is faster
than 6? :-(

oo

--
Cheers!

Dan Nagle

From: robin on 12 Oct 2009 10:28

"Dan Nagle" <dannagle(a)verizon.net> wrote in message news:h8j1n0$rje$1(a)news.eternal-september.org...
| Hello,
|
| On 2009-09-12 21:52:18 -0400, "robin" <robin_v(a)bigpond.com> said:
|
| > The operations required for halving are no different from ordinary
| > division, except that no division is required. Since there is no
| > divivion, that time is saved.
|
| As near as I can estimate, for published (not by Intel)
| timings of a typical x86 chip, the time to multiply
| by the precomputed reciprocal of a constant is 6 clocks,
| using the SSE floating multiply.
|
| The time to make the shift-and-add of the exponent
| is about 13 clocks.

Not using a scale instruction? You are using explicit instructions?
What don't you understand about the fact that division
is omitted when halving is carried out by a hardware
instruction?
What don't you comprehand about normalising?

On those machines where manufacturer's timings are provided for
the halve instruction and for the division instruction, halving is about
5 times quicker than division.

From: Dan Nagle on 12 Oct 2009 11:23

Hello,

On 2009-10-12 10:28:05 -0400, "robin" <robin_v(a)bigpond.com> said:

> Not using a scale instruction? You are using explicit instructions?

The timings for an x86 FSCALE instruction are about 17 clocks.
And the scale factor must be a floating point number. :-(

> What don't you understand about the fact that division
> is omitted when halving is carried out by a hardware
> instruction?

When the divisor is a compile-time constant,
the division will be replaced by a multiplication
by the reciprocal. Why is robin fascinated by division
when the divisor is a compile-time constant?

> What don't you comprehand about normalising?

robin's failure to read the Intel optimization guides'
warnings about using FSCALE with denormalized operands,
and other end cases, for starters. End cases count.

> On those machines where manufacturer's timings are provided for
> the halve instruction and for the division instruction, halving is about
> 5 times quicker than division.

Such as the x86? Who cares about S/360 in the 21st century?

oo

--
Cheers!

Dan Nagle

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: a simple overflow example
Next: Overload assignment operator for an user defined type array with ?vector scripts