From: robin on 25 May 2010 22:58 "Paul van Delst" <paul.vandelst(a)noaa.gov> wrote in message news:hthajv$oqj$1(a)news.boulder.noaa.gov... | Hello, | | Can some kind, patient person explain exactly what the use of static memory entails? For | example, if one uses the "-fstatic" option for a compiler. | | A colleague is having an issue with some code that runs fine on one machine (#1: redhat 4, | 32 bit, g95 v4.0), but bombs on another (#2: redhat 3, 32 bit, g95 v4.1). And by bombs I | mean it runs to completion, but the results are full of NaN's. | | One of the debug suggestions I made was to *remove* the "-fstatic" switch from the | compile. When she did that the runs bombed on both machines. She asked me why that would | happen and I have to admit I was at a bit of a loss to explain it. My own experience is | built around usage, not theory. If program delivers results erratically as NaN, the likely cause is one or more variables not being initialized. That means that the program is in error, and needs to be corrected. | What I did bumble on about (gleaned from various google searches) was: | | <quote> | Variables that are stored in static memory are allocated when a program is first run. | Thus, the variable remains in memory for the duration of the program and, typically, its | value is retained between calls to procedures. | | Also, static variables tend to be initialized (i.e. set to zero) when the program is started. No. What a particular compiler might do with storage locations reserved for variables is not relevant. You cannot rely on variables being initialized to anything. | Static variables behave similarly to SAVE'd variables (but I'm sure there are some | differences... not sure what they are) | | Thus, if the code depends on variables being static, basically it is not very robust code | (as you are discovering) since the behaviour of assuming initialised variables is simply | dangerous. | </quote> | | Can someone grade my response above with some additional information? I don't think static | and SAVE are the same, but an understanding of the differences elude me. | | Any info appreciated. | | cheers, | | paulv
From: robin on 25 May 2010 23:04 "Paul van Delst" <paul.vandelst(a)noaa.gov> wrote in message news:hthajv$oqj$1(a)news.boulder.noaa.gov... I should have added that your colleague should also look for subscript errors, as reading from outside an array will pick up rubbish, which cold look like a NaN.
From: JB on 26 May 2010 07:32 On 2010-05-25, Gordon Sande <Gordon.Sande(a)gmail.com> wrote: > probably an uninitilaized > variable. There are good > tools for dealing with those. They tend to be developed away from the > bazaar of open source. Bazaar or not, one common open source tool that tends to be good at finding use of uninitialized memory (and other memory errors) is valgrind. However, I would guess that it's C-centric to the point of considering SAVE'd variables without explicit initialization being initialized to 0, assuming the compiler sets them to 0 (which AFAIK most unix compilers do in order to take advantage of the .bss section in the object file). -- JB
From: Gordon Sande on 26 May 2010 08:58 On 2010-05-26 08:32:02 -0300, JB <foo(a)bar.invalid> said: > On 2010-05-25, Gordon Sande <Gordon.Sande(a)gmail.com> wrote: >> probably an uninitilaized >> variable. There are good >> tools for dealing with those. They tend to be developed away from the >> bazaar of open source. > > Bazaar or not, one common open source tool that tends to be good at > finding use of uninitialized memory (and other memory errors) is > valgrind. However, I would guess that it's C-centric to the point of > considering SAVE'd variables without explicit initialization being > initialized to 0, assuming the compiler sets them to 0 (which AFAIK > most unix compilers do in order to take advantage of the .bss section > in the object file). There are various "definitions" of undefined. Many are so watered down that it takes a skilled lawyer to show why they are not fraudulent. If you don't notice the distinctions you can be fooled into treating them as all the same. The useful definition is that the variable has never been assigned a value by the user after the variable has come into existence. The descriptions of Valgrind I have seen to not include such a capability. If it does have such a capability I would like to see a reference to it. Undefined variable checking requires either hardware assistance (parity checking is quick, easy and effective when possible) or much checking of all accessed values by the running program (which means that the object code is bulky and slowed in the several implementations I have seen) as a result of the compiler inserting the extra checking. Valgrind does not seem to benefit from either mode. It may be useful for some errors that the standard C runtime support assumes are not present (because it must assume the programmer is correct). I am aware of three systems that do undefined variable checking for current Fortran. These are Lahey/Fujitsu, NAG and Salford/Silverfrost. The classic example was WatFor which was parity based on IBM 7040 and software for IBM 360. Salford was software based for F77. Both WatFor and Salford are university based for fast turn around student debugging. Another area when the "definitions" are slippery is execution profiling. Some systems give exact line by line counts of the execution history as a result of extensive instrumentation. Others give a sampling of the location of the location counter with the association to the source done by looking at the loading map of the program. Same definition but such greatly different capabilities that one wonders how the same name can be justified. Valgrind offers sampling. Marketing!
From: Paul van Delst on 26 May 2010 10:01
John Paine wrote: > It seems to me that what you are really looking for is a simple to way > to find where the program is going wrong. This is currently masked by > the floating-point exception handler generating a NaN and the program > continuing on to the bitter end without generating anything useful. I > don't know what the switch is for the g95 compiler, but for the Intel > compiler you set the exception handler to use Underflow gives 0.0; Abort > on other IEEE exceptions (/fpe:0). This means that the program crashes > as soon as it encounters the operation that causes the NaN result. That > should then help you identify where it happens and more importantly why > it is happening. Oh, yes, I agree. I suggested all the usual initial approaches to debugging via the compiler: - turn all the checking/warning switches on, - determine if there is a switch to generate signaling NaN's so the program stops when one is generated - determine if there is a switch to initialise reals to zero/nan/inf and run tests for all the cases, - use an -fimplicit-none switch if available, - turn off the -fstatic switch. Then run the code through gdb. Once you've got that working on a particular system, run the code through valgrind. The removal of the -fstatic in the compilation stage was just the particular step that gave the first definitive results (i.e. everything broke). I was then asked what -fstatic does and that's what precipitated my question. Given the frequency with which the term is bandied about when discussing Fortran code, I was surprised when I found it difficult to find unequivocal explanations of what the use of static variables in Fortran code actually meant. (I realise the answers will be compiler and platform dependent) > On a more subjective note, I have experienced similar problems where the > same program behaves differently on different (but very similar) > machines. Quite intriguing, but horrible to debug. My all-time favourite > was the machine where the program ran fine, but only if the network > driver was not loaded. Another favourite was the one where two instances > of the program had to be started, the first one would not run correcly > but could be left in the background while the second one ran just fine. > Couldn't solve either of those, but both went away when the computers > were retired and replaced with new ones. Yep, I've had similar happen too. Several times. Either the machine changed (in some fashion), or the compiler did. Drives me nuts. cheers, paulv |