From: Tom on 3 Dec 2009 17:26 Hi, I have a code that is supposed to work as a parallel algorithm as well as as a single-CPU job, and one routine is for collecting data on a grid stored in arrays of the form a(-1:nx,-1:ny,-1:nz,nb) or v (4,-1:nx,-1:ny,-1:nz,nb) into one big array of the form atot(ndim, 0:nxtot-1,0:nytot-1,0:nztot-1,nbtot) in a different routine on the root process for write-out. This seems to work fine in the one-CPU version, where a and atot or v and vtot have the same number of elements (depending on the value of ndim passed to the subroutine), but in parallel runs, the program crashes in a totally erratic way, and I suspect that some memory corruption is going on. The only thing left I can think of is that I pass arrays in a bad way, even though the compiler doesn't complain about array mismatches. Here's what I do: In the calling routine: real :: t(-1:nx,-1:ny,-1:nz,nb), v(4,-1:nx,-1:ny,-1:nz,nb) call f_bindump(t,nx,ny,nz,nb,1,-1) ! ndim=1, nsel=-1 call f_bindump(v,nx,ny,nz,nb,4,-1) ! ndim=4, nsel=-1 call f_bindump(v2,nx,ny,nz,nb,9,1) ! ndim=9, nsel=1 In routine f_bindump (I know that ndim, nx, ny, nxtot, etc. are all correct): real, intent(in) :: a(ndim,-1:nx,-1:ny,-1:nz,nb) real, allocatable :: atot(:,:,:,:,:) allocate(atot(ndim,0:nxtot-1,0:nytot-1,0:nztot-1,nbtot)) if (nsel == -1) then ! pass the part of the array without the boundaries in the 2nd-4th dim npn=ndim*nx*ny*nz*nb call ggather(a(1:ndim,0:nx-1,0:ny-1,0:nz-1,1:nb),atot,npn) else ! pass slice nsel of the array without the boundaries in the 2nd-4th dim npn=nx*ny*nz*nb call ggather(reshape(a(nsel,0:nx-1,0:ny-1,0:nz-1,1:nb),(/ nx,ny,nz,nb/)),atot,npn) end if In routine ggather of the parallel version: real, intent(in) :: buf real, intent(out) :: buftot call MPI_GATHER(buf,4*n,MPI_BYTE,buftot,4*n,MPI_BYTE, 0,MPI_COMM_WORLD,ierr) In routine ggather of the single-CPU version (dummy copying routine, works ok): real, intent(in) :: buf(n) real, intent(out) :: buftot(n) buftot=buf Can anybody see something here that may give rise to memory corruption? I have been running the program with all kinds of debugging switches and with different ways of passing the array a in the call of ggather, everything to no avail. It crashes on writing files, but completely unpredictably, sometimes even in a different routine, sometimes not at all, but it has always worked so far if I comment out the call to the first calling routine, which makes me believe that the root of all evil lies in this set of routines. Thanks, Tom
From: glen herrmannsfeldt on 3 Dec 2009 20:42 Tom <flurboglarf(a)mailinator.com> wrote: > I have a code that is supposed to work as a parallel algorithm as well > as as a single-CPU job, and one routine is for collecting data on a > grid stored in arrays of the form a(-1:nx,-1:ny,-1:nz,nb) or v > (4,-1:nx,-1:ny,-1:nz,nb) into one big array of the form atot(ndim, > 0:nxtot-1,0:nytot-1,0:nztot-1,nbtot) in a different routine on the > root process for write-out. This seems to work fine in the one-CPU > version, where a and atot or v and vtot have the same number of > elements (depending on the value of ndim passed to the subroutine), > but in parallel runs, the program crashes in a totally erratic way, > and I suspect that some memory corruption is going on. The only thing > left I can think of is that I pass arrays in a bad way, even though > the compiler doesn't complain about array mismatches. Without reading all the details. (I did read the previous post.) The thing you have to watch out for is aliasing and copy-in/copy-out, especially in the case of multiple processors running on the same data. If on the parallel runs two different processors write to the same data without the appropriate interlocks, you will get the wrong answer. Sometimes the wrong data will cause the program to crash, otherwise just give random results. -- glen
From: robin on 3 Dec 2009 21:40 "Tom" <flurboglarf(a)mailinator.com> wrote in message news:b7e0f10a-a779-425c-a5fd-c3f7af56310b(a)a21g2000yqc.googlegroups.com... | Hi, | I have a code that is supposed to work as a parallel algorithm as well | as as a single-CPU job, and one routine is for collecting data on a | grid stored in arrays of the form a(-1:nx,-1:ny,-1:nz,nb) or v | (4,-1:nx,-1:ny,-1:nz,nb) into one big array of the form atot(ndim, | 0:nxtot-1,0:nytot-1,0:nztot-1,nbtot) in a different routine on the | root process for write-out. This seems to work fine in the one-CPU | version, where a and atot or v and vtot have the same number of | elements (depending on the value of ndim passed to the subroutine), | but in parallel runs, the program crashes in a totally erratic way, | and I suspect that some memory corruption is going on. The only thing | left I can think of is that I pass arrays in a bad way, even though | the compiler doesn't complain about array mismatches. You need explicit interfaces for each of the subroutines.
From: Tom on 3 Dec 2009 22:22 On Dec 3, 8:42 pm, glen herrmannsfeldt <g...(a)ugcs.caltech.edu> wrote: > The thing you have to watch out for is aliasing and copy-in/copy-out, > especially in the case of multiple processors running on the same data. > If on the parallel runs two different processors write to the same > data without the appropriate interlocks, you will get the wrong answer. Ok, but I don't see where that would happen here. The a arrays on the individual nodes don't overlap, and I would expect that MPI_GATHER takes care of the data not being written to the same address. The size of atot is exactly an integer multiple of the size of the transferred a. Isn't it the purpose of MPI_GATHER to avoid precisely the trap of writing to the same data? Tom
From: Tom on 3 Dec 2009 22:23
On Dec 3, 9:40 pm, "robin" <robi...(a)bigpond.com> wrote: > You need explicit interfaces for each of the subroutines. Why? I don't think so, I don't have optional arguments or such there, and as I said, the subroutine works well in single-CPU mode. Thomas |