From: Norbert_Paul on
Peter Keller wrote:
>> The error occurs extremely seldom. I had to run it several times
>> to get such output.
>
> Hrm. Maybe you should print out the lists too, and once you get the actual
> numerical inputs which produced the problem, you can test it out by hand
> to see what happens.
>
> -pete

Here is a session. I changed the vector coordinatess to multiples of 1d0
to be able to calculate it in gray soft memory.
I also tried integers and single floats. No problems with integers but
the same issue with single floats.
Thr wrong result is always too small. It looks like the value was returned
before computation ended. Are there hidden running conditions?


CL-USER> (dotimes (i 100000)
(let* ((v1 (list (* 1d0 (random 10)) (* 1d0 (random 10))))
(v2 (list (* 1d0 (random 10)) (* 1d0 (random 10))))
(dot1 (reduce #'+ (mapcar #'* v1 v2)))
(dot2 (reduce #'+ (mapcar #'* v1 v2))))
(when (/= dot1 dot2)
(print `(vdot1 ,dot2)))))

(15.0d0 50.0d0)
NIL
CL-USER> (dotimes (i 100000)
(let* ((v1 (list (* 1d0 (random 10)) (* 1d0 (random 10))))
(v2 (list (* 1d0 (random 10)) (* 1d0 (random 10))))
(dot1 (reduce #'+ (mapcar #'* v1 v2)))
(dot2 (reduce #'+ (mapcar #'* v1 v2))))
(when (/= dot1 dot2)
(print `(v1 ,v1 v2 ,v2 dot1 ,dot1 dot2 ,dot2)))))
NIL
CL-USER> (dotimes (i 100000)
(let* ((v1 (list (* 1d0 (random 10)) (* 1d0 (random 10))))
(v2 (list (* 1d0 (random 10)) (* 1d0 (random 10))))
(dot1 (reduce #'+ (mapcar #'* v1 v2)))
(dot2 (reduce #'+ (mapcar #'* v1 v2))))
(when (/= dot1 dot2)
(print `(v1 ,v1 v2 ,v2 dot1 ,dot1 dot2 ,dot2)))))

(V1 (6.0d0 3.0d0) V2 (6.0d0 6.0d0) DOT1 36.0d0 DOT2 54.0d0)
(V1 (0.0d0 2.0d0) V2 (5.0d0 3.0d0) DOT1 6.0d0 DOT2 0.0d0)
NIL
CL-USER> (dotimes (i 100000)
(let* ((v1 (list (* 1d0 (random 10)) (* 1d0 (random 10))))
(v2 (list (* 1d0 (random 10)) (* 1d0 (random 10))))
(dot1 (reduce #'+ (mapcar #'* v1 v2)))
(dot2 (reduce #'+ (mapcar #'* v1 v2))))
(when (/= dot1 dot2)
(print `(v1 ,v1 v2 ,v2 dot1 ,dot1 dot2 ,dot2)))))
NIL
CL-USER> (dotimes (i 100000)
(let* ((v1 (list (* 1d0 (random 10)) (* 1d0 (random 10))))
(v2 (list (* 1d0 (random 10)) (* 1d0 (random 10))))
(dot1 (reduce #'+ (mapcar #'* v1 v2)))
(dot2 (reduce #'+ (mapcar #'* v1 v2))))
(when (/= dot1 dot2)
(print `(v1 ,v1 v2 ,v2 dot1 ,dot1 dot2 ,dot2)))))

(V1 (1.0d0 1.0d0) V2 (4.0d0 6.0d0) DOT1 10.0d0 DOT2 0.0d0)
NIL
CL-USER>
From: Norbert_Paul on
Norbert_Paul wrote:
> [...] Are there hidden running conditions?

I had a suspect: gc

Note that in the following session the output always has
i-gc-ed-u bound to T. So the error seems to occur after gc, and,
maybe, on bare cmucl this error is less probable (no slime in memory)
but still possible.
Or could one of the (before/after) gc-hooks of slime cause trouble?

CL-USER> (defun testfn ()
(let* ((i-gc-ed-u nil)
(extensions:*gc-notify-after*
#'(lambda (a b c)
(declare (ignore a b c))
(setf i-gc-ed-u T))))
(dotimes (i 100000)
(setf i-gc-ed-u nil)
(let* ((v1 (list (* 1d0 (random 10)) (* 1d0 (random 10))))
(v2 (list (* 1d0 (random 10)) (* 1d0 (random 10))))
(dot1 (reduce #'+ (mapcar #'* v1 v2)))
(dot2 (reduce #'+ (mapcar #'* v1 v2))))
(when (/= dot1 dot2)
(print `(gc ,i-gc-ed-u v1 ,v1 v2 ,v2 dot1 ,dot1 dot2 ,dot2)))))
(if i-gc-ed-u :gc-occured :no-gc-yet)
))
TESTFN
CL-USER> (testfn)
:NO-GC-YET
CL-USER> (testfn)

(GC T V1 (1.0d0 3.0d0) V2 (3.0d0 2.0d0) DOT1 0.0d0 DOT2 9.0d0)
(GC T V1 (6.0d0 7.0d0) V2 (9.0d0 0.0d0) DOT1 0.0d0 DOT2 54.0d0)
(GC T V1 (0.0d0 5.0d0) V2 (7.0d0 5.0d0) DOT1 0.0d0 DOT2 25.0d0)
:NO-GC-YET
CL-USER> (testfn)
:NO-GC-YET
CL-USER> (testfn)

(GC T V1 (5.0d0 7.0d0) V2 (6.0d0 8.0d0) DOT1 86.0d0 DOT2 30.0d0)
(GC T V1 (8.0d0 5.0d0) V2 (2.0d0 5.0d0) DOT1 41.0d0 DOT2 25.0d0)
:NO-GC-YET
CL-USER> (testfn)
:NO-GC-YET
From: Helmut Eller on
* Norbert_Paul [2010-03-31 09:35+0200] writes:

> Norbert_Paul wrote:
>> [...] Are there hidden running conditions?
>
> I had a suspect: gc
>
> Note that in the following session the output always has
> i-gc-ed-u bound to T. So the error seems to occur after gc, and,
> maybe, on bare cmucl this error is less probable (no slime in memory)
> but still possible.
> Or could one of the (before/after) gc-hooks of slime cause trouble?

Not any more than other gc-hooks that use FP ops:

(defun testfn ()
(let* ((i-gc-ed-u nil)
(ext:*gc-notify-before* (lambda (a) (/ a 0.34d0))))
(dotimes (i 100000)
(setf i-gc-ed-u nil)
(let* ((v1 (list (* 1d0 (random 10)) (* 1d0 (random 10))))
(v2 (list (* 1d0 (random 10)) (* 1d0 (random 10))))
(dot1 (reduce #'+ (mapcar #'* v1 v2)))
(dot2 (reduce #'+ (mapcar #'* v1 v2))))
(when (/= dot1 dot2)
(print `(gc ,i-gc-ed-u v1 ,v1 v2 ,v2 dot1 ,dot1 dot2 ,dot2)))))))

also prints something in a tty session with CMUCL Snapshot 2010-02.

Helmut
From: Raymond Toy on
On 3/31/10 3:34 AM, Norbert_Paul wrote:
> Norbert_Paul wrote:
>> [...] Are there hidden running conditions?
>
> I had a suspect: gc
>
> Note that tin the following session the output always has
> i-gc-ed-u bound to T. So the error seems to occur after gc, and,
> maybe, on bare cmucl this error is less probable (no slime in memory)
> but still possible.
> Or could one of the (before/after) gc-hooks of slime cause trouble?

I can reproduce this. It's some issue with sse2 support. If you run it
using x87, there are no errors. It will take some time to figure this
out but having this simple test case will help a lot.


Ray
From: Raymond Toy on
On 3/31/10 9:02 AM, Raymond Toy wrote:
> On 3/31/10 3:34 AM, Norbert_Paul wrote:
>> Norbert_Paul wrote:
>>> [...] Are there hidden running conditions?
>>
>> I had a suspect: gc
>>
>> Note that tin the following session the output always has
>> i-gc-ed-u bound to T. So the error seems to occur after gc, and,
>> maybe, on bare cmucl this error is less probable (no slime in memory)
>> but still possible.
>> Or could one of the (before/after) gc-hooks of slime cause trouble?
>
> I can reproduce this. It's some issue with sse2 support. If you run it
> using x87, there are no errors. It will take some time to figure this
> out but having this simple test case will help a lot.

I think I found the problem. CMUCL was saving the x87 state but wasn't
saving the sse2 state. I think this is fixed now. At least your test
function no longer causes bogus results.

This should be available in the 2010-04 snapshot that will be available
soon. Please try this out when you get a chance.

Thanks,

Ray