From: Sjouke Burry on 29 Mar 2010 23:19 monir wrote: > On Mar 28, 5:27 am, aerogeek <sukhbinder.si...(a)gmail.com> wrote: >> Yes as everyone else said declaration is important. Show that and we >> might have some idea what the problem might be. >> >> In the meantime, can you just remove that pause statement and add a >> dummy print statement on that line. >> >> Long back I too had a similar problem and it has this similar curious >> effect. A statement not related to the code, was causing the wrong >> answers. And if I remember it correctly, the culprit was a wrong >> declaration!! >> > > Good news and bad news! > > 1) Adding a dummy Print statement and removing the Pause statement > didn't solve the problem. Remember the problem is not the Pause > statement but its absence. > Many other attempts have failed in identifying the cause of the > problem. Trial & error approach has not worked in this case. > > 2) With: > - no command-line error messages > - no compiler/compilation error messages > - no warning error messages > - no run-time error messages > - no stack (overflow?) error message > - no math error messages > (there might be some overlapping in the above, but you get the point!) > or simply: no syntax errors, no compiler errors, no logical errors, > and no run-time error messages (assuming NaN doesn't necessarily > belong to any of the above): > Obviously the program it is NOT error-free!! > But where would one look to identify the cause of the problem in a > ~22,000-line code with over 80 routines ?? > If the size of the exe file is any indication, it is ~ 1.5 MB. > > The question remains: > What makes a program works fine when it is temporarily suspended by a > Pause statement, but returns NaN w/o such code-unrelated statement ?? > > 3) Here's again the abbreviated sample code for easy reference: > (F77, g95) > > PROGRAM main > .................... > call dCpZeros() > ...................... > End main > ------------------------------------ > SUBROUTINE dCpZeros() > ..................... > do i=1, 9 > do j=1, 10 > do k=1, 30 > ..................... > call Polin2(a,b,c,d,x) > call Polin2(b,c,d,e,y) > call Polin2(c,d,e,f,z) > .................... > pause 'In Sub dCpZeros() 103' > .................... > print*,' x = ', x > print*,' y = ', y > print*,' z = ', z > end do > end do > end do > .................. > Return > End Subroutine dCpZeros > ------------------------------------ > SUBROUTINE Polin2(w1, w2, w3, w4, val) > implicit none !see item 5 > .................... > .................... > call UnStdy_Terms() > .................... > .................... > Return > End Subroutine Polin2 > ------------------------------------ > > 4) Having thoroughly re-checked all the declarations and argument > lists throughout {incl. routine Polin2()}, couldn't find any > inconsistencies or obscure code violations. All variables appear to > be correctly declared/dimensioned. > > 5) Subroutine Polin2() had ALL its variables declared, but had NO > "Implicit None" statement. > Not expecting much at this point, but for consistency with the other > routines I added the "Implicit None" statement, with NO ADDITIONAL > declarations or any changes. None. Just simply inserted "Implicit > None" at the top of Polin2(). > Deleted the Pause statement from the calling routine, saved, re- > compiled and ran. > Program works fine!!!!!!!! > > 6) Just to make sure that I wasn't seeing things, I deleted (not just > commented out!) the "Implicit None" in Sub Polin2(), still NO Pause > statement in dCpZeros(), saved, re-compiled and ran. > Program fails, returning NaN. > That was the good news! > > 7) The bad news is that the above fix treats the symptoms and doesn't > identify the problem. > It makes absolutely no sense that having either of the two code- > unrelated statements ("Pause" in the calling routine dCpZeros(), and/ > or "Implicit None" in the called routine polin2()) would produce the > correct results, while omitting either or both would return NaN. > In other words, one or both of these statements MUST be in for the > program to work correctly. > It is more likely that the problem will pop-up again at some point. > > 8) Based on my rather limited knowledge of Fortran, here's a thought > for you experts to critique. > As indicated earlier, the code (work-in-progress, ~ 22,000 lines and ~ > 80 routines) is mostly in F77, but with some limited patches of F90, > e.g.; use of unlabeled loops, vectors & matrices & array operations, > some of the new intrinsic functions, one Contains and one explicit > Interface, but no modules, no dynamic arrays, no defined data types, > no Pointers, no .... > I've always had some suspicions about such programming practices, even > though the g95 compiler never complained. But it seems reasonable to > expect at some point (depending on the complexity of the code and the > extent of the mix) that there would be a conflict that wouldn't be > detected/resolved by the compiler, leading to possible confusion or > misinterpretation or memory disruption or whatever. > > The "g95" compiler, or any other comparable compiler for that matter, > can't possibly detect and resolve each and every conflict that might > arise from a mixed F77+F90 programming. Correct ?? > Just a thought! ... you don't have to take it seriously if you don't > want to! > > 9) Meanwhile, will continue testing the program (with "Implicit None" > in Sub polin2() and no Pause in Sub dCpZeros()). > Have already tested 14 runs so far, each run under a different > scenario invoking different sets of routines. A run takes ~ 45.0 min > on a 3.16 GHz m/c. > So far so good! > (Sorry for the lengthy post) > > Regards. > Monir I had this error about 5 times(in 30 years), and in all cases it was out of bounds for array/charstring access. Switch boundary checking on, and put trace prints to a log file at strategic points, to show how far you got before the crash. Its tedious, but it will get you to the places where things are violated. In my case mostly an array address reaching zero or 1 higher than allowed.
From: Richard Maine on 29 Mar 2010 23:45 monir <monirg(a)mondenet.com> wrote: > As indicated earlier, the code (work-in-progress, ~ 22,000 lines and ~ > 80 routines) is mostly in F77, but with some limited patches of F90, > e.g.; use of unlabeled loops, vectors & matrices & array operations, > some of the new intrinsic functions, one Contains and one explicit > Interface, but no modules, no dynamic arrays, no defined data types, > no Pointers, no .... > I've always had some suspicions about such programming practices, even > though the g95 compiler never complained. But it seems reasonable to > expect at some point (depending on the complexity of the code and the > extent of the mix) that there would be a conflict that wouldn't be > detected/resolved by the compiler, leading to possible confusion or > misinterpretation or memory disruption or whatever. > > The "g95" compiler, or any other comparable compiler for that matter, > can't possibly detect and resolve each and every conflict that might > arise from a mixed F77+F90 programming. Correct ?? > Just a thought! ... you don't have to take it seriously if you don't > want to! Does not sound like a constructive direction of inquiry to me. For the most part, I consider it incorrect to even label it as mixed f77+f90. Almost all of f77 is also part of f95. The very few exceptions are matters of mostly academic interest, as all f95 compilers do them anyway and they are *NOT* things that are prone to obscure interactions. So what you have is just f95 code. I almost hesitate to mention that PAUSE does happen to be one of the few exceptions in that it is not technically part of f95 (but it is part of f90). I'm worried that might make you think in that direction, but you will just be wasting your time (and that of anyone who pursues it with you) if you look in that direction. It is hard to imagine anything with less in the way of subtle interactions than PAUSE. I don't have any more suggestions based on the data at hand. Remote debugging is tricky. I cannot rule out even things like getting confused as to exactly what code actually got compiled and used for each test. There can be things such as editing the code, but neglecting to save the edited file so that you are looking at something different from what you actually have compiled and run. I am not making that scenario up. I have seen it happen. Heck, I have done it myself. -- Richard Maine | Good judgment comes from experience; email: last name at domain . net | experience comes from bad judgment. domain: summertriangle | -- Mark Twain
From: Louis Krupp on 30 Mar 2010 07:51 monir wrote: > On Mar 28, 5:27 am, aerogeek <sukhbinder.si...(a)gmail.com> wrote: >> Yes as everyone else said declaration is important. Show that and we >> might have some idea what the problem might be. >> >> In the meantime, can you just remove that pause statement and add a >> dummy print statement on that line. >> >> Long back I too had a similar problem and it has this similar curious >> effect. A statement not related to the code, was causing the wrong >> answers. And if I remember it correctly, the culprit was a wrong >> declaration!! >> > > Good news and bad news! > > 1) Adding a dummy Print statement and removing the Pause statement > didn't solve the problem. Remember the problem is not the Pause > statement but its absence. > Many other attempts have failed in identifying the cause of the > problem. Trial & error approach has not worked in this case. You're going to have to do what every programmer does: Learn how to use a debugger, or get better at trial and error. I'd start with the debugger. Set a watchpoint at x, and see how its value changes. Does it ever take the correct value? Is this value overwritten with a NaN? Where in the code does this happen? HTH. Louis
From: Gordon Sande on 30 Mar 2010 09:17 On 2010-03-29 23:02:13 -0300, monir <monirg(a)mondenet.com> said: > 8) Based on my rather limited knowledge of Fortran, here's a thought > for you experts to critique. > As indicated earlier, the code (work-in-progress, ~ 22,000 lines and ~ > 80 routines) is mostly in F77, but with some limited patches of F90, > e.g.; use of unlabeled loops, vectors & matrices & array operations, > some of the new intrinsic functions, one Contains and one explicit > Interface, but no modules, no dynamic arrays, no defined data types, > no Pointers, no .... > I've always had some suspicions about such programming practices, even > though the g95 compiler never complained. But it seems reasonable to > expect at some point (depending on the complexity of the code and the > extent of the mix) that there would be a conflict that wouldn't be > detected/resolved by the compiler, leading to possible confusion or > misinterpretation or memory disruption or whatever. > > The "g95" compiler, or any other comparable compiler for that matter, > can't possibly detect and resolve each and every conflict that might > arise from a mixed F77+F90 programming. Correct ?? > Just a thought! ... you don't have to take it seriously if you don't > want to! You seem to be rather error prone or having extreme difficulty in checking for errors. This may be due to seeing what you expect as is all too common for most of us. Limited knowledge makes it worse. Describing the errors as you understand then will only reinforce your limited knowledge. To allow the compiler a fighting chance of detecting what is surely an array overrun error or possibly an argument mismatch error you need to take a time out and enable all the full F90 error checking. By the way there is no such thing as mixed F77/F90 as it is all F90 with a fair amount of the older style argument association rules that were all that was permitted in F77. They can be rather error prone for those who are playing games with limited knowledge. Step one, stick every thing in a new dummy program where prog main .... end sub ... end .... becomes prog F90_main implicit none call main contains sub main implicit none .... end sub implicit none sub ... end sub .... end to get argument checking. As long as you are not passing procedure names (i.e real use of externals) there should be no gotchas. Make sure there are "implicit none"s everywhere. And for step two you need to change everything to use assumed shape argument passing. That is the use of ":" which will be OK as step one provided explicit interfaces. This will either be painless or a bother if you have been playing games with argument association. If you have been then you have been playing with fire for someone with "limited knowledge". The merely sensible games will turn into slices and beyond that you were warned. Step three is to turn on the subscript checking and all other debugging. You can either ignore these comments and spend the next couple weeks chasing fairies or you can spend a couple days to make the changes and trap the gremlin quickly. It is all too common that there is no time to do it right initially but there is lots of time to do it over when it has clearly failed.
From: robin on 31 Mar 2010 04:36 "monir" <monirg(a)mondenet.com> wrote in message news:166453c3-8c33-4507-8468-2dd31302a349(a)e7g2000vbp.googlegroups.com... >Good news and bad news! >1) Adding a dummy Print statement and removing the Pause statement >didn't solve the problem. Remember the problem is not the Pause >statement but its absence. >Many other attempts have failed in identifying the cause of the >problem. Trial & error approach has not worked in this case. You have already vbeen advised of possible causes of this problem. You need to activate subscript bounds checking. Have you tried that? >The question remains: >What makes a program works fine when it is temporarily suspended by a >Pause statement, but returns NaN w/o such code-unrelated statement ?? Answers have already been given. 1. Subscript bound errors. 2. Mismatched arguments/dummy arguments at calls. >3) Here's again the abbreviated sample code for easy reference: > (F77, g95) You still haven't provided declarations. Without declarations, data types cannot be checked. As I said before, the code you supplied code means nothing without the declarations.
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: Problem with Matmul Next: Calling DLL subroutine from C++ |