Prev: Problems matching between FORTRAN COMMON and C struct definedin a dll
Next: pointer nullify and deallocate 101
From: dpb on 2 Apr 2010 15:44 monir wrote: .... > NOT being able so far to trap the problem or the code violation, if > any, leaves me with couple of options: > > 1) POST the entire F77 code: > as a zip file and include the input files to look at. > It is a good idea, but with no documentation it would be extremely > difficult even for you experts to follow the program logic. > And reducing it to a meaningful size for posting while ensuring it > still generates the NaN error is not an easy task, and would still be > considered as an (extended) abbreviated version, and I might in the > process cut out the source of the problem! But, you've posted versions before that weren't very big _BUT_ left out the most important parts to allow anybody to decipher the problems of mismatched arguments, subscripts, conventions, etc., that experts could see by inspection if were made available. As another respondent has said, it's a real puzzle why you're so reluctant to provide relatively simple information that could at least eliminate areas of concern in lieu of continuing to protest why you can't... :( .... > The general consensus among the responders is that the problem could > be attributed to: > a- declaration issues > b- arrays out of bounds > c- mismatched arguments > d- data on the stack unexpectedly or unintentionally moved around as a > result of a non-interfering statement such as "PAUSE" or "IMPLICIT > NONE" > e- any combinations of the above > f- none of the above! > I'm reasonably confident, after so much re-checking and testing, that > it is NOT a- , b- or c- above, but I could be wrong! Indeed, I don't think you've provided anything here that gives any of the previous respondents (and surely not this particular one) any confidence at all that the above confidence is well-founded (not personal, just we've not seen the evidence and as another respondent noted, we've seen instances of confusion over syntax that raise serious concerns that you recognize certain problems when they exist simply looking at the source code). > 4) I suggested earlier: >> ... it seems reasonable to expect at some point >> (depending on the complexity of the code and the >> extent of the mix) that there would be a conflict that wouldn't be >> detected/resolved by the compiler, leading to possible confusion or >> misinterpretation or memory disruption or whatever. As RM said, there is no "mix"--that there _could_ be a compiler but is possible but it's certainly not the place to start until all other issues have been resolved. The first place to start to eliminate that is, of course, to ensure you've used the latest compiler release and, also, as has also been suggested, use another compiler (or two or three) on the code as each has its own strengths in terms of diagnostics, etc. .... > 5) OK. Here is my latest attempt: > a- I took a version of the offended code Test1.FOR, and made sure NO > "PAUSE" in Sub dCpzeros() and NO "IMPLICIT NONE" in Sub Polin2() > b- re-compiled and ran the program > ...got (as expected) ..... x = NaN > c- renamed the source code (self-contained single file) as Test1F.F90 > ...The MinGW-g95 manual states: " ... with F90 name extension, the > source code is pre-processed with the C preprocessor." > Not knowing exactly what that means, I took it to imply that something > is done by the g95 compiler when using .F90 extension that otherwise > is NOT done (with .FOR). Yes, the source file is preprocessed by the C preprocessor (imagine that! :) ). Won't make any difference whatsoever unless you have preprocessor directives in the code. If you don't know what a preprocessor directive is, I'd presume that it's fairly unlikely you're using them unless this is inherited code. > d- changed the F77 style to F90 style throughout, namely: > ...replaced "c" in col 1 by "!" > ...added "&" for continuation lines and removed char from col 6 > ...deleted blanks between digits (initially for easy reading/editing > long numbers)...(which is allowed in *.FOR, but gave DATA syntax error in > *.F90) Specifically, it is a syntax error in free-format F90+, not F90 per se as fixed format F90 is also F90. One incompatibility is that in order to implement free form source, spaces had to become significant which weren't/aren't in fixed form. But, note that it's the source form that's the difference (albeit free form source was introduced in F90). .... > > e- compiled: > ...>g95 -fbounds-check -ftrace=full -o Test1F Test1F.F90 > and ran. > PROGRAM Works Fine!!!! returning: > ........... x = -1.0676971 (correct) > > 6) THE above may or may not be the cure, since it does not directly > supports or refutes the earlier suggestion (Item 4 above). > Furthermore, it might be just temporarily masking the problem! Nor does it refute necessarily the previous a) thru f), particularly if there are still references of argument association that make bounds checking difficult or impossible. The way I'd think you would have the most success would be the (again previously suggested) method of also including the code in modules for the generation of interfaces automagically that can uncover many of these aforementioned issues as a result. What would be my firstest step after the above, though, would be to reintroduce "IMPLICIT NONE" and redo the above. If the error is back, you've got a problem, Houston... > PLEASE provide at your convenience the name of a modern debugger... I still think you'll get where you need to get faster by going the modules route and supplying the information previously requested than the debugger at this point...I don't think you've done the prerequisite work to make that fruitful yet. --
From: Gordon Sande on 2 Apr 2010 15:49 On 2010-04-02 16:09:09 -0300, monir <monirg(a)mondenet.com> said: > On Apr 1, 1:39 am, aerogeek <sukhbinder.si...(a)gmail.com> wrote: >> On Apr 1, 12:08 am, Craig Powers <craig.pow...(a)invalid.invalid> wrote: > >>> monir wrote: > >>>> 2) Here's again an abbreviated sample code for easy reference: >>>> (F77, g95) > >>> The problem with the abbreviated sample is that it's so abbreviated, it >>> cuts out the problem. > >>> I don't disagree with you that 22k-ish lines is not practical to post. >>> However, there are a couple of things you can and *should* do: >>> * Try running it with absolutely every check the compiler offers turned >>> on. (I see below you've tried to do this >>> and it didn't get you anywhere... in that case, see next.) >>> * Try to cut it down to a manageable size. In the process, maybe you'll >>> discover what the problem is yourself. If it goes away when you take >>> out a particular piece, that alone gives you an avenue to pursue in >>> trying to find your problem. If you succeed in producing a manageable >>> size example, well, now you've got something to post. > >>>> monir wrote: >>>> 3) There appears to be some confusion on when the (current) program >>>> correctly works and when it doesn't. >>>> Here's a summary for clarification: >>>> (ref is to a SINGLE statement in the above abbreviated sample code) > >>>> a) with "! pause" and "!! implicit none" NOT activated: >>>> .......................... program returns x = NaN >>>> c) with "! pause" NOT activated and "!! implicit none" Activated : >>>> .......................... program returns x = -1.0676971 (correct) > >>> This is rather interesting. I don't think adding IMPLICIT NONE should >>> change the meaning of a program that continues to compile successfully. >>> Most compilers have an option that lets you produce assembly output; >>> have you tried comparing the results for the routine in question with >>> and without IMPLICIT NONE? > > .....YES I have many times. ALL Routine works perfectly when tested > in isolation. > .....I got the assembly output (~ 2,000 pages), but not sure what to > look for ? > .....For example, at the top it displays: > ......................................... > .comm _abscisae_, 36000 # 36000 > .comm _crt_, 496 # 484 > .comm _d2cp_, 144000 # 144000 > .comm _d9mach_, 160 # 152 > ......................................... > > .....ARE the above pairs of numbers (bytes?) supposed to be the same > or they're ref to something else ?? > >>>> monir wrote: >>>> 8) Based on my rather limited knowledge of Fortran, here's a thought >>>> for you experts to critique. >>>> As indicated earlier, the code (work-in-progress, ~ 22,000 lines and ~ 80 >>>> routines) is mostly in F77, but with some limited patches of F90, e.g.; >>>> use of unlabeled loops, vectors & matrices & array operations, some new >>>> intrinsic functions, one Contains and one explicit Interface, but no >>>> modules, no dynamic arrays, no defined data types, no Pointers, no ... >>>> I've always had some suspicions about such programming practice, even >>>> though the g95 compiler never complained. But it seems reasonable to >>>> expect at some point (depending on the complexity of the code and the >>>> extent of the mix) that there would be a conflict that wouldn't be >>>> detected/resolved by the compiler, leading to possible confusion or >>>> misinterpretation or memory disruption or whatever. >>>> The "g95" compiler, or any other comparable compiler for that matter, >>>> can't possibly detect and resolve each and every conflict that might arise >>>> from a mixed F77+F90 programming. Correct ?? >>>> Just a thought! ... you don't have to take it seriously if you don't >>>> want to! > >>>> 5) Some have indicated that mismatched arguments could have caused the >>>> error. >>>> A very valid point, and I've been looking at this for some time now. >>>> But think about it for a moment. If there are mismatched arguments, >>>> how would/could inserting a "Pause" statement in one of the routines >>>> or just adding "implicit none" in another (with no additional >>>> declarations) correct the mismatch and force the algorithms to work >>>> "perfectly" producing the correct results throughout ?? >>>> This is the other part of the mystery! > >> aerogeek wrote: >> I had this very specific problem. A non interfering statement like in >> your case pause, was causing the same problem for my code. > >> This code was running perfectly well in windows system but i saw this >> problem once i tried the program on a linux system. > >> So if possible can you try compiling and running your program on a >> different system. If possible. > > .... UNFORTUNATELY, I don't have access to other systems. > >> For me the problem had something to do with incorrect array bounds, >> which was not apparant and didn't come to notice untill i used dbx, >> the debugger. > >> So get a debugger and run through the code via a debugger for the >> conditions its failing. I am sure you will get to the bottom of the >> problem. > > $$ ===================== $$ > > NOT being able so far to trap the problem or the code violation, if > any, leaves me with couple of options: > > 1) POST the entire F77 code: > as a zip file and include the input files to look at. > It is a good idea, but with no documentation it would be extremely > difficult even for you experts to follow the program logic. > And reducing it to a meaningful size for posting while ensuring it > still generates the NaN error is not an easy task, and would still be > considered as an (extended) abbreviated version, and I might in the > process cut out the source of the problem! > > 2) USE a modern debugger. > In the past I used the MS Fortran metacommand "$DEBUG:" for debugging > (I believe that what it was called!); by inserting it in the source > code (could appear multiple times). It was part of the MS Fortran > compiler. > > What modern Fortran Debugger would you recommend (Win XP OS) ?? > Is there a connection between the Fortran compiler g95 and the > debugger ? or it works independently ? > Does it matter if the code is F77 or F90 or F77+F90 ?? > (I hope it is free!) Silverfrost F95 (Salford F95 as it used to be called) is free for personal use. It runs under Windows either from its own IDE or a Windows command line. Salford have an older F77 but you want the F95. There is no such thing as a mix of F77 and F90 as F77 is a subset of F90 so it is all F90. It may be true that some of you code would be F77 but when it is mixed in with stuff that is only in F90 the result is all F90. Compile the entire program with both subscript and undefined varable checking and run the result. It does no good to compile with the options set, then toss the object and finally recompile and run without the options. Salford has been sugested to you in the past. Is there some reason why you have to repeatedly ask the same question? You are unlikely to get differing advice. > 3) BACK to the problem in hand. > The general consensus among the responders is that the problem could > be attributed to: > a- declaration issues > b- arrays out of bounds > c- mismatched arguments > d- data on the stack unexpectedly or unintentionally moved around as a > result of a non-interfering statement such as "PAUSE" or "IMPLICIT > NONE" > e- any combinations of the above > f- none of the above! > I'm reasonably confident, after so much re-checking and testing, that > it is NOT a- , b- or c- above, but I could be wrong! You have the classical symptoms of storing outside of array bounds. Often this will cause terminaltion for invalid code or if you are luckly merely changing of unexpected variables. The symptom and the cause are often widely separated. This is called buffer overrun is other circumstances and is the major cause of the much publicized browser security problems. You have an "a", "b" and "c". "d" makes no sense and mostly just shows that you have not understood a highly technical answer. You asked why the symptom comes and goes and got an explanation that the code and data layouts will differ when you make minor changes which can radically change the symptoms of the out of bounds subscript. Smetime you shoot yourself in the foot and othertimes you miss - the problem is the shooting and not whether you have slightly moved your foot. If you have no type mismatchs as you claim and the program runs with subscript checking the best guess is that you are not describing the array layouts across calls so that the actual sequence association you get is not what you think it is. This is something that I recall seeing many questions about from you. It is unlikely that you have a single mistake but rather several versions of the same logic error. So you need good debugging tools like the Salford debugging compiler. It carries much extra information across calls so it can do full checking. This extra work is not usually done in Fortran as the programmer is required to get it right so having more efficient object code is perfectly OK and in fact insisted upon by many users. The diagnostic that I expect will indicate that some array as declared in some subroutine is not properly contained in the array passed to the subroutine. Ordinary subscript checking will happily allow the subroutine to store into the elements that are outside the passed array, otherwise known as an array overrun. The symptoms of this will not be obviously related to the actual error which is why debugging these errors is difficult. > 4) I suggested earlier: >> ... it seems reasonable to expect at some point >> (depending on the complexity of the code and the >> extent of the mix) that there would be a conflict that wouldn't be >> detected/resolved by the compiler, leading to possible confusion or >> misinterpretation or memory disruption or whatever. >> The "g95" compiler, or any other comparable compiler for that matter, >> can't possibly detect and resolve each and every conflict that might arise >> from a mixed F77+F90 programming. >> Just a thought! ... you don't have to take it seriously if you don't want to! > > Richard Main and others responded: >>> ... I consider it incorrect to even label it as mixed f77+f90. >>> Almost all of f77 is also part of f95. The very few exceptions are >>> matters of mostly academic interest, as all f95 compilers do them anyway >>> and they are *NOT* things that are prone to obscure interactions. So >>> what you have is just f95 code. > > 5) OK. Here is my latest attempt: > a- I took a version of the offended code Test1.FOR, and made sure NO > "PAUSE" in Sub dCpzeros() and NO "IMPLICIT NONE" in Sub Polin2() > b- re-compiled and ran the program > ...got (as expected) ..... x = NaN > c- renamed the source code (self-contained single file) as Test1F.F90 > ...The MinGW-g95 manual states: " ... with F90 name extension, the > source code is pre-processed with the C preprocessor." > Not knowing exactly what that means, I took it to imply that something > is done by the g95 compiler when using .F90 extension that otherwise > is NOT done (with .FOR). > Let me try it. > > d- changed the F77 style to F90 style throughout, namely: > ...replaced "c" in col 1 by "!" > ...added "&" for continuation lines and removed char from col 6 > ...deleted blanks between digits (initially for easy reading/editing > long numbers) > .....e.g.; Data GaussWg ( 7) / 0.0910282619 8296364981 1497220702 > 892 d0 / > ..........(which is allowed in *.FOR, but gave DATA syntax error in > *.F90) > ......... was changed to: > ..........Data GaussWg ( 7) / > 0.091028261982963649811497220702892d0 / > That was all. Nothing else was changed. > > e- compiled: > ...>g95 -fbounds-check -ftrace=full -o Test1F Test1F.F90 > and ran. > PROGRAM Works Fine!!!! returning: > ........... x = -1.0676971 (correct) > > 6) THE above may or may not be the cure, since it does not directly > supports or refutes the earlier suggestion (Item 4 above). > Furthermore, it might be just temporarily masking the problem! > > PLEASE provide at your convenience the name of a modern debugger (Item > 2 above) and will go through the code line-by-line to identify the > culprit once and for all and get to the bottom of the problem in > Test1.FOR. > > Thank you kindly for your patience! > Monir
From: steve on 2 Apr 2010 16:19 On Apr 2, 2:09 pm, monir <mon...(a)mondenet.com> wrote: > On Apr 1, 1:39 am, aerogeek <sukhbinder.si...(a)gmail.com> wrote: > > > > > On Apr 1, 12:08 am, Craig Powers <craig.pow...(a)invalid.invalid> wrote: > > > monir wrote: > > > > 2) Here's again an abbreviated sample code for easy reference: > > > > (F77, g95) > > > The problem with the abbreviated sample is that it's so abbreviated, it > > > cuts out the problem. > > > I don't disagree with you that 22k-ish lines is not practical to post.. > > > However, there are a couple of things you can and *should* do: > > > * Try running it with absolutely every check the compiler offers turned > > > on. (I see below you've tried to do this > > > and it didn't get you anywhere... in that case, see next.) > > > * Try to cut it down to a manageable size. In the process, maybe you'll > > > discover what the problem is yourself. If it goes away when you take > > > out a particular piece, that alone gives you an avenue to pursue in > > > trying to find your problem. If you succeed in producing a manageable > > > size example, well, now you've got something to post. > > > > monir wrote: > > > > 3) There appears to be some confusion on when the (current) program > > > > correctly works and when it doesn't. > > > > Here's a summary for clarification: > > > > (ref is to a SINGLE statement in the above abbreviated sample code) > > > > a) with "! pause" and "!! implicit none" NOT activated: > > > > .......................... program returns x = NaN > > > > c) with "! pause" NOT activated and "!! implicit none" Activated : > > > > .......................... program returns x = -1.0676971 (correct) > > > This is rather interesting. I don't think adding IMPLICIT NONE should > > > change the meaning of a program that continues to compile successfully. > > > Most compilers have an option that lets you produce assembly output; > > > have you tried comparing the results for the routine in question with > > > and without IMPLICIT NONE? > > .....YES I have many times. ALL Routine works perfectly when tested > in isolation. > .....I got the assembly output (~ 2,000 pages), but not sure what to > look for ? > .....For example, at the top it displays: > ......................................... > .comm _abscisae_, 36000 # 36000 > .comm _crt_, 496 # 484 > .comm _d2cp_, 144000 # 144000 > .comm _d9mach_, 160 # 152 > ......................................... > > .....ARE the above pairs of numbers (bytes?) supposed to be the same > or they're ref to something else ?? > > > > > > > monir wrote: > > > > 8) Based on my rather limited knowledge of Fortran, here's a thought > > > > for you experts to critique. > > > > As indicated earlier, the code (work-in-progress, ~ 22,000 lines and ~ 80 > > > > routines) is mostly in F77, but with some limited patches of F90, e..g.; > > > > use of unlabeled loops, vectors & matrices & array operations, some new > > > > intrinsic functions, one Contains and one explicit Interface, but no > > > > modules, no dynamic arrays, no defined data types, no Pointers, no .... > > > > I've always had some suspicions about such programming practice, even > > > > though the g95 compiler never complained. But it seems reasonable to > > > > expect at some point (depending on the complexity of the code and the > > > > extent of the mix) that there would be a conflict that wouldn't be > > > > detected/resolved by the compiler, leading to possible confusion or > > > > misinterpretation or memory disruption or whatever. > > > > The "g95" compiler, or any other comparable compiler for that matter, > > > > can't possibly detect and resolve each and every conflict that might arise > > > > from a mixed F77+F90 programming. Correct ?? > > > > Just a thought! ... you don't have to take it seriously if you don't > > > > want to! > > > > 5) Some have indicated that mismatched arguments could have caused the > > > > error. > > > > A very valid point, and I've been looking at this for some time now.. > > > > But think about it for a moment. If there are mismatched arguments, > > > > how would/could inserting a "Pause" statement in one of the routines > > > > or just adding "implicit none" in another (with no additional > > > > declarations) correct the mismatch and force the algorithms to work > > > > "perfectly" producing the correct results throughout ?? > > > > This is the other part of the mystery! > > aerogeek wrote: > > I had this very specific problem. A non interfering statement like in > > your case pause, was causing the same problem for my code. > > This code was running perfectly well in windows system but i saw this > > problem once i tried the program on a linux system. > > So if possible can you try compiling and running your program on a > > different system. If possible. > > .... UNFORTUNATELY, I don't have access to other systems. > > > For me the problem had something to do with incorrect array bounds, > > which was not apparant and didn't come to notice untill i used dbx, > > the debugger. > > So get a debugger and run through the code via a debugger for the > > conditions its failing. I am sure you will get to the bottom of the > > problem. > > $$ ===================== $$ > > NOT being able so far to trap the problem or the code violation, if > any, leaves me with couple of options: > > 1) POST the entire F77 code: > as a zip file and include the input files to look at. > It is a good idea, but with no documentation it would be extremely > difficult even for you experts to follow the program logic. > And reducing it to a meaningful size for posting while ensuring it > still generates the NaN error is not an easy task, and would still be > considered as an (extended) abbreviated version, and I might in the > process cut out the source of the problem! > I would not call myself an expert in Fortran (I suspect some here would even endorse that notion :), but I do know that I can take your problematic code and try gfortran -fcheck=all -ffpe-trap=invalid -fbacktrace Test1.FOR ../a.out with at least some expectation of a core dump if NaN occurs. So, once again, post a URL to a zip archive. -- steve
From: Craig Powers on 2 Apr 2010 16:23 monir wrote: > .....YES I have many times. ALL Routine works perfectly when tested > in isolation. > .....I got the assembly output (~ 2,000 pages), but not sure what to > look for ? > .....For example, at the top it displays: > ......................................... > .comm _abscisae_, 36000 # 36000 > .comm _crt_, 496 # 484 > .comm _d2cp_, 144000 # 144000 > .comm _d9mach_, 160 # 152 > ......................................... > > .....ARE the above pairs of numbers (bytes?) supposed to be the same > or they're ref to something else ?? My very specific suggestion with respect to assembly output was to compare the result with IMPLICIT NONE with the result without; I don't think there should be any differences at all, but you said you got different behavior.
From: glen herrmannsfeldt on 2 Apr 2010 17:54 Gordon Sande <Gordon.Sande(a)eastlink.ca> wrote: (really big snip) > There is > no such thing as a mix of F77 and F90 as F77 is a subset of F90 so it > is all F90. It may be true that some of you code would be F77 but when > it is mixed in with stuff that is only in F90 the result is all F90. There are a few features in Fortran 77 that were removed in Fortran 95. Some may have been added in Fortran 77, so didn't have so long to begin to be used in actual programs. Fortran 77 added the use of REAL and DOUBLE PRECISION variables in DO loops. While many problems can be caused through the use of such variables, I don't see the need to remove them from the standard. (All the other languages that I know with a looping statement allow them.) Next is branching to ENDIF from outside its block. Why that was added, I don't know. It isn't hard to fix, either, so this one is fine with me. Then there is ASSIGN, assigned GOTO, and the use of ASSIGN with Format statement numbers. ASSIGN goes back to Fortran I, while the use for FORMAT was only added in Fortran 77. Reminds me of trying, in a very early program that I wrote, to use a variable for format statement number in a WRITE statement. (Fortran 66 allows arrays, but not scalar variables.) Last is the H format descriptor. I have known Fortran preprocessors to generate these on output to be strictly compatible with Fortran 66. Maybe one of the more popular extensions was allowing apostrophes in FORMAT. I don't know anyone who misses the H descriptor. It seems that these were removed in Fortran 95, so technically all Fortran 77 programs are also Fortran 90 programs, but not necessarily Fortran 95 programs. -- glen
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: Problems matching between FORTRAN COMMON and C struct definedin a dll Next: pointer nullify and deallocate 101 |