Prev: How to pass function names as arguments when function returns an array?
Next: Interpolation between Bezier curves
From: bingo on 15 Feb 2010 20:43 Dear all: Not sure this is the correct forum I should post this, since the question is more mpi related. But I'm getting really desperate. Any suggestion or redirection to other appropriate forum would be greatly appreciated. Ok, the problem is: I have one master node (node=0) mainly used for communication and several slave processors to run calculations independently. After finishing the calculation, the slave processor will send the result (endInfo in the program) to the master node. Based on the a global variable (injtProb), which will be broadcasted to all the processors, the master node will decide whether to send new initial condition (config in program) to start new calculation. I have attached my main program based on the above idea. It seems to me I'm having deadlock problem, meaning the progrom will halt without further output. But I really couldn't figure out why so. Any help will be greatly appreciated, and also please let me know if I haven't explained the problem clearly. Thanks, Bin ===================================== step = 0 do while (.true.) if (node .eq. 0) then do rank = 1, nprocs-1 ! check whether we should start new calc on the node if (sum(injtProb(rank,:)) > 0.5) then ! choose initial condition call cf_choose_entry(rank) ! send initial condition call mpi_send(config, 4, mpidp, rank, confTag, mpi_comm_world, & ierror) ! receive result call mpi_recv(endInfo, 7, mpidp, rank, endTag, mpi_comm_world, & istatus, ierror) ! update injtProb based on the received result call cf_update_param() endif enddo step = step + 1 ! if calculation converged, if (cf_converg() < 1e-4) then exit endif else if (sum(injtProb(node,:)) > 0.5) then ! receive initial condition call mpi_recv(config, 4, mpidp, 0, confTag, mpi_comm_world, istatus,ierror) ! perform calculation call run_sim() ! send result call mpi_send(endInfo, 7, mpidp, 0, endTag, mpi_comm_world, ierror) endif endif call mpi_bcast(injtProb, numVC*numVC, mpidp, 0, mpi_comm_world, ierror) enddo
From: Harald Anlauf on 16 Feb 2010 14:30 On Feb 16, 2:43 am, bingo <zhn...(a)gmail.com> wrote: > if (node .eq. 0) then [...] > ! if calculation converged, > if (cf_converg() < 1e-4) then > exit > endif > else Maybe I am missing something important, but the exit happens only on PE 0, not on the other PEs. So I suggest to introduce a logical l_converg = (cf_converg() < 1e-4) and broadcast this from PE 0 to all PEs outside this IF construct, just before the final bcast shown above, and move the exit after the bcast if l_converg. If this doesn't help, you could also try "classic debugging" by inserting print statements to see which PE is entering or returning from MPI communication to get additional clues.
From: bingo on 16 Feb 2010 15:50 Thank you very much for the reply. That's indeed one of the problem. Now I have another question. By doing the loop, as shown below, I'm actually applying dependency between individual slave processors. That said, now the rank=2 need to wait for master processor to receive result from rank=1. Even though the calculation on rank=2 might finish earlier, it won't receive further data until the calculation on rank=1 is finished. Even worse, the whole process won't proceed any further if the calculation on one processor get stuck. I think this should be a problem of my bad coding algorithm. Ideally, the slave processors should ran independently, and only communicate with the master node. Is there a way to improve on this? Thanks again. Bin ===================================== do rank = 1, nprocs-1 ! send initial condition call mpi_send(config, 4, mpidp, rank, confTag, mpi_comm_world, ierror) ! receive result call mpi_recv(endInfo, 7, mpidp, rank, endTag, mpi_comm_world, istatus, ierror) enddo On Feb 16, 11:30 am, Harald Anlauf <anlauf.2...(a)arcor.de> wrote: > On Feb 16, 2:43 am, bingo <zhn...(a)gmail.com> wrote: > > > > > if (node .eq. 0) then > [...] > > ! if calculation converged, > > if (cf_converg() < 1e-4) then > > exit > > endif > > else > > Maybe I am missing something important, but the exit happens only > on PE 0, not on the other PEs. So I suggest to introduce a logical > l_converg = (cf_converg() < 1e-4) and broadcast this from PE 0 to > all PEs outside this IF construct, just before the final bcast shown > above, and move the exit after the bcast if l_converg. > > If this doesn't help, you could also try "classic debugging" by > inserting print statements to see which PE is entering or returning > from MPI communication to get additional clues.
From: Harald Anlauf on 16 Feb 2010 16:23 On Feb 16, 9:50 pm, bingo <zhn...(a)gmail.com> wrote: > Now I have another question. By doing the loop, as shown below, I'm > actually applying dependency between individual slave processors. That > said, now the rank=2 need to wait for master processor to receive > result from rank=1. Even though the calculation on rank=2 might finish > earlier, it won't receive further data until the calculation on rank=1 > is finished. Even worse, the whole process won't proceed any further > if the calculation on one processor get stuck. I think this should be > a problem of my bad coding algorithm. Ideally, the slave processors > should ran independently, and only communicate with the master node. > Is there a way to improve on this? > > Thanks again. > Bin > > ===================================== > do rank = 1, nprocs-1 > ! send initial condition > call mpi_send(config, 4, mpidp, rank, confTag, > mpi_comm_world, ierror) > ! receive result > call mpi_recv(endInfo, 7, mpidp, rank, endTag, > mpi_comm_world, istatus, ierror) > enddo > Well, this is not really a Fortran-related question, but anyway, you have multiple options: First split the loop into parts where initial data are distributed, then another loop where results are collected. In these loops: - Either use mpi_scatter/mpi_gather to distribute the data to the individual PEs, or receive the results. (Recommended) - use explicit coding instead of the collective MPI operations. E.g.: do rank = 0, nprocs-1 if (node == 0) then if (rank /= node) then call mpi_send (..., rank, ...) end if else ! node /= 0 if (rank == node) then call mpi_recv (..., 0, ...) end if end if end do to distribute the data to the PEs, and corresponding code to gather the results. (Warning: the above code has not been tested, but I think you get the idea. I use similar code on machines where mpi_scatter has poor performance or when I need to debug.)
From: rudra on 17 Feb 2010 06:31
not really mpi master, but you can try :http://groups.google.com/group/ comp.parallel.mpi/topics |