From: Gaetano Esposito on 3 Mar 2010 22:31 The problem I am going to detail used to occur with "random" (i.e. multiple) combination of compilers and machines/architectures. Because of this uncertainty, I switched a while ago to static compilation on one machine with the program running on others. At first I was satisfied with this solution, but now the problem is back here, and I ran out of ideas... I must use a legacy code which interfaces with other codes through an unformatted binary file. The legacy code writes a binary file, which in turn is read by another program unit. This is real messy, I tried to rewrite everything without the useless binary read/write but the legacy code is too massive to be cleanly dealt with, and I am running out of time. The piece of legacy code writing the binary is: WRITE (LINKTP) VERS, PREC, KERR ... WRITE (LINKTP) LENIMC, LENRMC, NO, KK, NLITE WRITE (LINKTP) PATM, (WT(K), EPS(K), SIG(K), DIP(K), 1 POL(K), ZROT(K), NLIN(K), K=1,KK), 2 ((COFLAM(N,K),N=1,NO),K=1,KK), 4 ((COFETA(N,K),N=1,NO),K=1,KK), 5 (((COFD(N,J,K),N=1,NO),J=1,KK),K=1,KK), 6 (KTDIF(N),N=1,NLITE), 7 (((COFTD(N,J,L),N=1,NO),J=1,KK),L=1,NLITE) There is a correspondent READ statement in the subroutine supposed to handle this: READ (LINKMC, ERR=999) VERS, PREC, KERR READ (LINKMC, ERR=999) LI, LR, NO, NKK, NLITE READ (LINKMC) PATMOS, (RMCWRK(NWT+N-1), 1 RMCWRK(NEPS+N-1), RMCWRK(NSIG+N-1), 2 RMCWRK(NDIP+N-1), RMCWRK(NPOL+N-1), RMCWRK(NZROT+N-1), 3 IMCWRK(INLIN+N-1), N=1,NKK), 4 (RMCWRK(NLAM+N-1), N=1,NK), (RMCWRK(NETA+N-1), N=1,NK), 5 (RMCWRK(NDIF+N-1), N=1,NK2), 6 (IMCWRK(IKTDIF+N-1), N=1,NLITE), (RMCWRK(NTDIF+N-1), N=1,NKT) For a reason that is behind my comprehension, the file created in the WRITE statement, cannot be read by the READ statement. I investigated this problem for a long time comparing a working version of the unformatted file (which I had kept from other computation) to the ones which now are not working (by "working" I mean it is readable by the READ statement). It turns out that the two files are **almost** (of course) identical. If the unix command "cmp" is unreadable for me, if I dump the binary files content using "od -x > linking_file" for both (working and not- working) versions and "diff" them, I get just one different line: $ diff tplink_working_x tplink_x 5c5 < 0000100 0002 0000 0014 0000 523c 0006 0000 0000 --- > 0000100 0002 0000 0014 0000 0014 0000 0000 0000 Now, I found out that it's in the third record (the third READ) the problem for the not working binary: because I am not an expert of fortran I/O, I manually added variables to be read in the last record, until I got the error (luckily it was happening at the 3rd value!). Code: " [first two READ statements OK] .... read(l2) data1, data2, data3 .... " Runtime error message: " forrtl: severe (67): input statement requires too much data [...] " Therefore, I hypothesize that the problem is in the WRITE statement. My questions are: _ What is causing this erratic behavior in WRITE? Is there a way to isolate it? _ What could cause a forced end of record instruction to appear? _ I understand that sometimes erratic behaviors are associated with memory faults. Can this be the case? If yes, note that all the check bounds flags, do not give errors. Moreover there are several COMMON blocks in the legacy code. I appreciate your always incredibly competent and spot on comments.
From: Richard Maine on 3 Mar 2010 23:05 Gaetano Esposito <gaetano.esposito(a)gmail.com> wrote: > The piece of legacy code writing the binary is: > > WRITE (LINKTP) VERS, PREC, KERR > ... > WRITE (LINKTP) LENIMC, LENRMC, NO, KK, NLITE > WRITE (LINKTP) PATM, (WT(K), EPS(K), SIG(K), DIP(K), > 1 POL(K), ZROT(K), NLIN(K), K=1,KK), > 2 ((COFLAM(N,K),N=1,NO),K=1,KK), > 4 ((COFETA(N,K),N=1,NO),K=1,KK), > 5 (((COFD(N,J,K),N=1,NO),J=1,KK),K=1,KK), > 6 (KTDIF(N),N=1,NLITE), > 7 (((COFTD(N,J,L),N=1,NO),J=1,KK),L=1,NLITE) > > There is a correspondent READ statement in the subroutine supposed to > handle this: > > READ (LINKMC, ERR=999) VERS, PREC, KERR > READ (LINKMC, ERR=999) LI, LR, NO, NKK, NLITE > READ (LINKMC) PATMOS, (RMCWRK(NWT+N-1), > 1 RMCWRK(NEPS+N-1), RMCWRK(NSIG+N-1), > 2 RMCWRK(NDIP+N-1), RMCWRK(NPOL+N-1), RMCWRK(NZROT+N-1), > 3 IMCWRK(INLIN+N-1), N=1,NKK), > 4 (RMCWRK(NLAM+N-1), N=1,NK), (RMCWRK(NETA+N-1), N=1,NK), > 5 (RMCWRK(NDIF+N-1), N=1,NK2), > 6 (IMCWRK(IKTDIF+N-1), N=1,NLITE), (RMCWRK(NTDIF+N-1), N=1,NKT) > > > For a reason that is behind my comprehension, the file created in the > WRITE statement, cannot be read by the READ statement. Looks like either a typo or a simple confusion to me. First, note one of the most common FAQs here. Declarations matter. A lot. You haven't shown them. There is no way that anyone can be sure that the above code is correct without looking at the declarations. We might be able to find things wrong with it, but no way that we can guarantee it is correct. In particular, all the data types better match exactly in the READ and WRITE code. But that aside... Just go through the above READ, item at a time and compare it to the WRITE. It doesn't take long to get to a discrepancy. The first 2 reads look to correspond (assuming the right data types). But look at the 3rd one more closely. Let me work through it for you. The read of patmos correxponds to the write of patm. Ok. Then you read 7 arrays in an implied DO loop with an index from 1 to nkk. That's 7*nkk values, where nkk was read from record 2. That looks to correspond with the 7*kk values written (and the nkk in th eread corresponds to the kk from the write). No way I can check the array dimensions with the data given, but I'll ignore that. Ok. Next is a read with an implied DO look from 1 to NK. And NK is.... what? There is no hint of where this came from or what value it might have. Maybe it is a typo for NO, or NKK, or ??? Anyway, you need to go through and compare the read and write piece by piece like that. You ought to be reading the same number of elements that you wrote (or at least no more). The above code sure doesn't look like it does that. If it does, then it depends on other code not shown to define appropriate values for NK (and NK2 and NKT, which also seemed to pop up from nowhere). > [first two READ statements OK] > ... > read(l2) data1, data2, data3 No declarations again. Are these arrays? Yes, it matters. A lot. Also, if KK happens to be 0 in the writing code, there might not be 3 values written. > _ What could cause a forced end of record instruction to appear? Nothing. Not a constructive avenue to pursue. -- Richard Maine | Good judgment comes from experience; email: last name at domain . net | experience comes from bad judgment. domain: summertriangle | -- Mark Twain
From: onateag on 4 Mar 2010 11:27 Richard, thanks for the reply. I understand that often the problem is simpler than it looks, but before posting, I had checked every declaration and other basic stuff (I read a lot of your other posts), all of them are fine. All the integers you see popping up out of nowhere, are calculated beforehand, and they are definitively not the problem. Because of that, my focus now is to understand why the old linking file works and the new one does not. If I understand what is wrong in the not-working linking file, I could go back to the source of the error. For this purpose, I thought it would have been interesting, as you correctly suggested, to check the READ item by item comparing the two linking files. Keep in mind that the two linking files were created by **exactly** the same legacy code as I posted previously, and that is the puzzling part to me. I am going to copy all the code I wrote for checking, and the output, I am sure it will answer most of your doubts. Code: program test implicit none integer, parameter :: l1=40, l2=41 integer :: ierr double precision :: data1, data2, data3 integer, dimension(5) :: d1, d2 double precision, dimension(100) :: dd1, dd2 open(l1,form='unformatted',file='tplink_working') open(l2,form='unformatted',file='tplink') print*,'first record, l1' READ (l1) data1, data2, data3 print*, data1, data2, data3 print*,'first record, l2' READ (l2) data1, data2, data3 print*, data1, data2, data3 print*,'second record, l1' read(l1) d1 print*,d1 print*,'second record, l2' read(l2) d2 print*,d2 print*,'third record, l1' ! 100, just for the sake of showing that it goes on reading read(l1) dd1 print*,dd1 ! ATTENTION : If I stop at "data1, data2", no error! print*,'third record, l2' read(l2) data1, data2, data3 print*,data1, data2, data3 end program test Output: $ ./test.exe first record, l1 6.013470019174502E-154 6.013470016999068E-154 6.067619314340693E-154 first record, l2 6.013470019174502E-154 6.013470016999068E-154 6.067619314340693E-154 second record, l1 446 237984 4 111 2 second record, l2 446 237984 4 111 2 third record, l1 1013250.00000000 39.9480018615723 136.500000000000 3.33000000000000 [...] [ lots of data ...] [...] 0.000000000000000E+000 0.000000000000000E+000 3.785766995733680E-270 -9.255965345320369E+061 third record, l2 forrtl: severe (67): input statement requires too much data, unit 41, file /gluster/bigtmp/gle6b/Grid-Test/test-tplink/tplink Something I want to point out is that even if I substitute the last print with: print*,'third record, l2' read(l2) data1, data2 print*,data1, data2 print*,'possible extra record?' read(l2) data3 print*,data3 I get: [same outupt as before, before the error] third record, l2 1013250.00000000 39.9480018615723 possible extra record? forrtl: severe (39): error during read, unit 41, file /gluster/bigtmp/ gle6b/Grid-Test/test-tplink/tplink So, I have two unformatted files, same size (I know it doesn't matter, but it's to underline that I am not writing neither junk nor less data than expected), smal difference, created by the **same** code. But there is something I am missing. On Mar 3, 11:05 pm, nos...(a)see.signature (Richard Maine) wrote: > Gaetano Esposito <gaetano.espos...(a)gmail.com> wrote: > > The piece of legacy code writing the binary is: > > > WRITE (LINKTP) VERS, PREC, KERR > > ... > > WRITE (LINKTP) LENIMC, LENRMC, NO, KK, NLITE > > WRITE (LINKTP) PATM, (WT(K), EPS(K), SIG(K), DIP(K), > > 1 POL(K), ZROT(K), NLIN(K), K=1,KK), > > 2 ((COFLAM(N,K),N=1,NO),K=1,KK), > > 4 ((COFETA(N,K),N=1,NO),K=1,KK), > > 5 (((COFD(N,J,K),N=1,NO),J=1,KK),K=1,KK), > > 6 (KTDIF(N),N=1,NLITE), > > 7 (((COFTD(N,J,L),N=1,NO),J=1,KK),L=1,NLITE) > > > There is a correspondent READ statement in the subroutine supposed to > > handle this: > > > READ (LINKMC, ERR=999) VERS, PREC, KERR > > READ (LINKMC, ERR=999) LI, LR, NO, NKK, NLITE > > READ (LINKMC) PATMOS, (RMCWRK(NWT+N-1), > > 1 RMCWRK(NEPS+N-1), RMCWRK(NSIG+N-1), > > 2 RMCWRK(NDIP+N-1), RMCWRK(NPOL+N-1), RMCWRK(NZROT+N-1), > > 3 IMCWRK(INLIN+N-1), N=1,NKK), > > 4 (RMCWRK(NLAM+N-1), N=1,NK), (RMCWRK(NETA+N-1), N=1,NK), > > 5 (RMCWRK(NDIF+N-1), N=1,NK2), > > 6 (IMCWRK(IKTDIF+N-1), N=1,NLITE), (RMCWRK(NTDIF+N-1), N=1,NKT) > > > For a reason that is behind my comprehension, the file created in the > > WRITE statement, cannot be read by the READ statement. > > Looks like either a typo or a simple confusion to me. > > First, note one of the most common FAQs here. Declarations matter. A > lot. You haven't shown them. There is no way that anyone can be sure > that the above code is correct without looking at the declarations. We > might be able to find things wrong with it, but no way that we can > guarantee it is correct. In particular, all the data types better match > exactly in the READ and WRITE code. But that aside... > > Just go through the above READ, item at a time and compare it to the > WRITE. It doesn't take long to get to a discrepancy. The first 2 reads > look to correspond (assuming the right data types). But look at the 3rd > one more closely. Let me work through it for you. > > The read of patmos correxponds to the write of patm. Ok. > > Then you read 7 arrays in an implied DO loop with an index from 1 to > nkk. That's 7*nkk values, where nkk was read from record 2. That looks > to correspond with the 7*kk values written (and the nkk in th eread > corresponds to the kk from the write). No way I can check the array > dimensions with the data given, but I'll ignore that. Ok. > > Next is a read with an implied DO look from 1 to NK. And NK is.... what? > There is no hint of where this came from or what value it might have. > Maybe it is a typo for NO, or NKK, or ??? > > Anyway, you need to go through and compare the read and write piece by > piece like that. You ought to be reading the same number of elements > that you wrote (or at least no more). The above code sure doesn't look > like it does that. If it does, then it depends on other code not shown > to define appropriate values for NK (and NK2 and NKT, which also seemed > to pop up from nowhere). > > > [first two READ statements OK] > > ... > > read(l2) data1, data2, data3 > > No declarations again. Are these arrays? Yes, it matters. A lot. > > Also, if KK happens to be 0 in the writing code, there might not be 3 > values written. > > > _ What could cause a forced end of record instruction to appear? > > Nothing. Not a constructive avenue to pursue. > > -- > Richard Maine | Good judgment comes from experience; > email: last name at domain . net | experience comes from bad judgment. > domain: summertriangle | -- Mark Twain
From: glen herrmannsfeldt on 4 Mar 2010 15:17 Gaetano Esposito <gaetano.esposito(a)gmail.com> wrote: > The problem I am going to detail used to occur with "random" (i.e. > multiple) combination of compilers and machines/architectures. Because > of this uncertainty, I switched a while ago to static compilation on > one machine with the program running on others. At first I was > satisfied with this solution, but now the problem is back here, and I > ran out of ideas... (snip) > I investigated this problem for a long time comparing a working > version of the unformatted file (which I had kept from other > computation) to the ones which now are not working (by "working" I > mean it is readable by the READ statement). > It turns out that the two files are **almost** (of course) identical. > If the unix command "cmp" is unreadable for me, if I dump the binary > files content using "od -x > linking_file" for both (working and not- > working) versions and "diff" them, I get just one different line: > $ diff tplink_working_x tplink_x > 5c5 > < 0000100 0002 0000 0014 0000 523c 0006 0000 0000 > --- >> 0000100 0002 0000 0014 0000 0014 0000 0000 0000 Post the first 20 lines of od -x for each file. Also, post all statements between the second and third READ statement in the real program. -- glen
From: Richard Maine on 4 Mar 2010 16:26 onateag <gaetano.esposito(a)gmail.com> wrote: > thanks for the reply. I understand that often the problem is simpler > than it looks, but before posting, I had checked every declaration and > other basic stuff (I read a lot of your other posts), all of them are > fine. All the integers you see popping up out of nowhere, are > calculated beforehand, and they are definitively not the problem. I can only debug what I see. If I'm just assured that everything has been checked and is fine, even though I don't see any of it, that doesn't leave me much to go on. When someone assures me that the parts they didn't show me are all fine, that often tends to make me more suspicious of those parts, rather than less so. I don't see anything else in what was posted that I can help with. From the first post > $ diff tplink_working_x tplink_x > 5c5 > < 0000100 0002 0000 0014 0000 523c 0006 0000 0000 > --- > > 0000100 0002 0000 0014 0000 0014 0000 0000 0000 I am slightly puzzled in that I can't quite match the single octal dump line shown with the test program. It looks like it might plausibly have the end of the second record and the beginning of a third. The 2 0 14 0 could plausibly be the 2 at the end of the second record, followed by a trailing record size (14 hex = 20 dec, which would be right). But it doesn't look aligned right, unless the first record is longer than the read suggests... which might be possible; those are funny looking values in the first record, maybe Hollerith? The first record ought to have taken 32 bytes (3*8 for the data, and 2*4 for the record header and trailers, assuming the most common 32-bit structures). Then the second should have taken 28 bytes (5*4 data plus 2*4 header/trailer). But this is showing what loks like the end of the second record after 72 bytes, which seems 12 bytes too far in, if I got all the arithmetic straight. As Glenn says, maybe a full hex dump of the first bit of the file might help more; the one line in isolation isn't enough. I'm still not sure that would tell me enough, but it might help some. If the above is the end of the second record, then it looks like one of the files has a longish 3rd record (6523c hex = 414268 decimal bytes), while the other has a 3rd record with only 20 bytes of data. That would reasonably well match your observed ability to read 2 8-byte values from it, but fail reading a third. Why the file would be that way, I have no data to see. -- Richard Maine | Good judgment comes from experience; email: last name at domain . net | experience comes from bad judgment. domain: summertriangle | -- Mark Twain
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: Is there a log framwork for fortran? Next: How to INQUIRE if a directory is empty? |