From: Mirko on
Hello,

I am diving into parallel processing in that other ancient language
(fortran), and I was running one of intel's sample OpenMP problems
which calculates the number of primes, 4n+1 and 4n+3 primes below
10^7. So, I thought, why not transcribe it into lisp, and see how
times compare.

This post is about the following:
- I was not able to eliminate one optimization warning
- Can my declarations be improved
- Comparison of fortran and cl run times.

Because of licensing issues, I will not post intel's fortran code.
But I tried to copy the algorithm and the code as verbatim as
possible.

I am running this on a recent sbcl on a 64-bit redhat linux. The
transcribed code is here: http://paste.lisp.org/display/111258.

With maximum optimization, I am getting warnings in the following code
(elided here)
(declare (fixnum index))
(floor (sqrt (coerce index 'float)))

Error message:
forced to do full call
unable to do inline float truncate (cost 5) because:
The result is a (VALUES UNSIGNED-BYTE &OPTIONAL), not a (VALUES
(SIGNED-
BYTE 32)
&REST
T).
unable to do inline float truncate (cost 5) because:
The result is a (VALUES UNSIGNED-BYTE &OPTIONAL), not a (VALUES

(UNSIGNED-BYTE
32)
&REST
T).
forced to do full call
unable to do inline float coercion (cost 5) because:
The first argument is a UNSIGNED-BYTE, not a (SIGNED-BYTE 32).
unable to do inline float coercion (cost 6) because:
The first argument is a UNSIGNED-BYTE, not a (UNSIGNED-BYTE 32).
doing float to pointer coercion (cost 13)

Is there some way to eliminate that?


Second question, can the declarations be improved since all the
integers are positive.

And finally here are the results. First ifort:

Range to check for Primes: 1 10000000
We are using 1 thread(s)
Number of primes found: 664579
Number of 4n+1 primes found: 332181
Number of 4n-1 primes found: 332398
Run time: 6.390028

Now sbcl:
Range to check for Primes: 1, 10000000
Number of primes found: 664579
Number of 4n+1 primes found:332181
Number of 4n-1 primes found:332398
Evaluation took:
7.940 seconds of real time
7.938793 seconds of total run time (7.926795 user, 0.011998 system)
[ Run times consist of 0.022 seconds GC time, and 7.917 seconds non-
GC time. ]
99.99% CPU
23,760,484,821 processor cycles
80,382,896 bytes consed

So we have a 25% difference. Not bad.

Thanks,

Mirko
From: Raymond Toy on
On 6/8/10 10:44 AM, Mirko wrote:
> Hello,
>
> I am diving into parallel processing in that other ancient language
> (fortran), and I was running one of intel's sample OpenMP problems
> which calculates the number of primes, 4n+1 and 4n+3 primes below
> 10^7. So, I thought, why not transcribe it into lisp, and see how
> times compare.
>
> This post is about the following:
> - I was not able to eliminate one optimization warning
> - Can my declarations be improved
> - Comparison of fortran and cl run times.
>
> Because of licensing issues, I will not post intel's fortran code.
> But I tried to copy the algorithm and the code as verbatim as
> possible.
>
> I am running this on a recent sbcl on a 64-bit redhat linux. The
> transcribed code is here: http://paste.lisp.org/display/111258.
>
> With maximum optimization, I am getting warnings in the following code
> (elided here)
> (declare (fixnum index))
> (floor (sqrt (coerce index 'float)))

Since this is from fortran, presumably index is actually positive and is
probably a 32 bit or 64 bit integer. Declare it as such:

(declare (type (unsigned-byte 32) index))

Note that converting a 32-bit integer to a float loses bits, but if the
fortran code did that, then perhaps it's ok. With this declaration, the
compiler should know that floor can only return a 32-bit (or 64-bit)
integer, so I think the warnings should be gone.


>
> Second question, can the declarations be improved since all the
> integers are positive.

The declaration above is an example of declaring a non-negative (and
bounded) integer.

Ray
From: Mirko on
On Jun 8, 2:48 pm, Raymond Toy <toy.raym...(a)gmail.com> wrote:
> On 6/8/10 10:44 AM, Mirko wrote:
>
>
>
> > Hello,
>
> > I am diving into parallel processing in that other ancient language
> > (fortran), and I was running one of intel's sample OpenMP problems
> > which calculates the number of primes, 4n+1 and 4n+3 primes below
> > 10^7.  So, I thought, why not transcribe it into lisp, and see how
> > times compare.
>
> > This post is about the following:
> > - I was not able to eliminate one optimization warning
> > - Can my declarations be improved
> > - Comparison of fortran and cl run times.
>
> > Because of licensing issues, I will not post intel's fortran code.
> > But I tried to copy the algorithm and the code as verbatim as
> > possible.
>
> > I am running this on a recent sbcl on a 64-bit redhat linux.  The
> > transcribed code is here:http://paste.lisp.org/display/111258.
>
> > With maximum optimization, I am getting warnings in the following code
> > (elided here)
> >  (declare (fixnum index))
> >  (floor (sqrt (coerce index 'float)))
>
> Since this is from fortran, presumably index is actually positive and is
> probably a 32 bit or 64 bit integer.  Declare it as such:
>
>   (declare (type (unsigned-byte 32) index))
>
> Note that converting a 32-bit integer to a float loses bits, but if the
> fortran code did that, then perhaps it's ok.  With this declaration, the
> compiler should know that floor can only return a 32-bit (or 64-bit)
> integer, so I think the warnings should be gone.
>
>
>
> > Second question, can the declarations be improved since all the
> > integers are positive.
>
> The declaration above is an example of declaring a non-negative (and
> bounded) integer.
>
> Ray

Thanks Ray.

The change did not remove the above-mentioned compiler warnings, but
it did bring down the execution time: It went from 7.94 to 6.62
seconds, which is about 3.5% slower than the ifort. (Today my brain
is just not up to diving into hyperspec and divining the meaning of
the warning and how to remove it).

BTW, the above numbers for the fortran executable are with the -fast
compilation option for ifort.

Profiling of the fortran code (using vtune) shows that most of the
time is spent in the `mod' function: (if (zerop (mod index
factor)) ...) which is the sieve for the primes. I did not profile
the lisp code.

Mirko
From: Raymond Toy on
On 6/8/10 4:09 PM, Mirko wrote:
> On Jun 8, 2:48 pm, Raymond Toy <toy.raym...(a)gmail.com> wrote:
>> On 6/8/10 10:44 AM, Mirko wrote:
> Thanks Ray.
>
> The change did not remove the above-mentioned compiler warnings, but

Bummer. How about just doing

(declare (type (unsigned-byte 32) x))
...
(floor (sqrt x))

That produces no warnings for me with cmucl.

Ray

From: Mirko on
On Jun 8, 5:48 pm, Raymond Toy <toy.raym...(a)gmail.com> wrote:
> On 6/8/10 4:09 PM, Mirko wrote:
>
> > On Jun 8, 2:48 pm, Raymond Toy <toy.raym...(a)gmail.com> wrote:
> >> On 6/8/10 10:44 AM, Mirko wrote:
> > Thanks Ray.
>
> > The change did not remove the above-mentioned compiler warnings, but
>
> Bummer.  How about just doing
>
>   (declare (type (unsigned-byte 32) x))
>   ...
>   (floor (sqrt x))
>
> That produces no warnings for me with cmucl.
>
> Ray

I still got warning. But I also discovered that I could remove
`coerce' from (floor (sqrt (coerce index 'float))). Declaring index
as (unsigned-byte 32), (floor (sqrt index)) works fine.

The run-time is reduced by about 0.07 seconds.

But with that change I get another warning. The (floor (sqrt index))
is embedded in a do loop:

(do ((index start (+ index 2))
...

The compiler warns me: doing unsigned word to integer conversion to
INDEX. This is probably due to the `(+ index 2).

Mirko