From: Jacek Dziedzic on 3 Aug 2008 14:19 Paul Pluzhnikov pisze: > Jacek Dziedzic <jacek.dziedzic__no--spam__(a)gmail.com> writes: > >> Pretty weird, this fortran, huh? :) > > There is absolutely nothing special about FORTRAN; it's just > another user-level program, and it obeys the rules of the game > just like any other user-level program, whether written in C, C++, > hand-coded assembly, or java. The special thing about it is that the Fortran RTL tries to catch signals that kill the program, while at least C and C++ runtimes do not do this (or I've never seen it). Or so I believe. cheers, - J.
From: Jacek Dziedzic on 3 Aug 2008 14:30 fjblurt(a)yahoo.com wrote: > I am curious about this, but I don't know any Fortran. Can you > reproduce the problem with a minimal Fortran program and post it here, > along with instructions on compiling, and the compiler/runtime > versions you're using? OK, I've written a shortest fortran program that segfaults, compiled it with g77 and it behaves as one would expect -- it segfaults with a "Segmentation fault" which can be trapped in gdb. When I compile it under ifort (Intel's fortran compiler), it crashes with a SIGSEGV, but this is intercepted by the RTL. However, gdb still successfully traps this, so it can be debugged. Must be what Paul Pluzhnikov suggested -- when I run it in an MPI environment, there are child processes involved, maybe that's why then the debugger cannot catch the signal. A transcipt of a session (txt) is here: http://tiny.pl/2pjn thanks, - J.
From: Jacek Dziedzic on 3 Aug 2008 14:31 Ron Ford wrote: > Is the fortran source something you can post? OK, I've written a shortest fortran program that segfaults, compiled it with g77 and it behaves as one would expect -- it segfaults with a "Segmentation fault" which can be trapped in gdb. When I compile it under ifort (Intel's fortran compiler), it crashes with a SIGSEGV, but this is intercepted by the RTL. However, gdb still successfully traps this, so it can be debugged. Must be what Paul Pluzhnikov suggested -- when I run it in an MPI environment, there are child processes involved, maybe that's why then the debugger cannot catch the signal. A transcipt of a session (txt) is here: http://tiny.pl/2pjn thanks, - J.
From: Jacek Dziedzic on 3 Aug 2008 14:31 OK, I've written a shortest fortran program that segfaults, compiled it with g77 and it behaves as one would expect -- it segfaults with a "Segmentation fault" which can be trapped in gdb. When I compile it under ifort (Intel's fortran compiler), it crashes with a SIGSEGV, but this is intercepted by the RTL. However, gdb still successfully traps this, so it can be debugged. Must be what you suggested -- when I run it in an MPI environment, there are child processes involved, maybe that's why then the debugger cannot catch the signal. A transcipt of a session (txt) is here: http://tiny.pl/2pjn thanks, - J.
From: Paul Pluzhnikov on 3 Aug 2008 16:52 Jacek Dziedzic <jacek.dziedzic__no--spam__(a)gmail.com> writes: > Must be what Paul Pluzhnikov suggested -- when I run it in > an MPI environment, there are child processes involved, maybe > that's why then the debugger cannot catch the signal. It's not a "may be"; it is. The debugger can't catch a signal in a process that it is not debugging (one of the MPI "slave" processes). What you need to do is arrange to attach to the crashing process before the crash. Since you know that it is "rank 0 in job 1", you can probably arrange for the particular process that is executing that piece of work to sleep(600), and attach gdb to it (I do not know enough about MPI to tell you exactly how to achieve that). Also note that TotalView debugger has specific hooks for MPI, and (AFAIU) automatically attaches to all the "slave" jobs "out of the box". Perhaps TotalView is a better tool for your particular problem. Cheers, -- In order to understand recursion you must first understand recursion. Remove /-nsp/ for email.
First
|
Prev
|
Pages: 1 2 3 Prev: mmap(MAP_SHARED) and msync(MS_INVALIDATE) Next: Overwriting and reloading library |