From: Gideon on 6 Aug 2010 11:28 From a cursory search, I know questions like this have been asked before, but I wanted to ask something more pointed. I know that when you write an unformatted file in fortran, it sticks a "header" before your data that indicates how many bytes of data your real information is. On my Intel Core 2 OS X machine, I discovered a while ago (from the intel ifort documentation) that it was sticking this information in as a 4 byte integer at the beginning of the file. Thus, I was free to either skip the first 4 bytes if I knew how large my data structure was, or read in this integer to figure out how large it was. I should have prefaced this by saying that I'm mostly writing arrays of double precision numbers and then reading them into MATLAB. Anyways, I recently had a colleague experiment with this on an intel machine running some flavor of linux. On this setup, it turned out to use an 8 byte integer which took us some time to discover. So here's my question: is there an easy, robust, way to discover what size the header of a fortran unformatted file is on a given architecture/OS?
From: Dave Allured on 6 Aug 2010 13:05 Gideon wrote: > > From a cursory search, I know questions like this have been asked > before, but I wanted to ask something more pointed. I know that when > you write an unformatted file in fortran, it sticks a "header" before > your data that indicates how many bytes of data your real information > is. On my Intel Core 2 OS X machine, I discovered a while ago (from > the intel ifort documentation) that it was sticking this information > in as a 4 byte integer at the beginning of the file. Thus, I was free > to either skip the first 4 bytes if I knew how large my data structure > was, or read in this integer to figure out how large it was. I should > have prefaced this by saying that I'm mostly writing arrays of double > precision numbers and then reading them into MATLAB. > > Anyways, I recently had a colleague experiment with this on an intel > machine running some flavor of linux. On this setup, it turned out to > use an 8 byte integer which took us some time to discover. > > So here's my question: is there an easy, robust, way to discover what > size the header of a fortran unformatted file is on a given > architecture/OS? This is a tricky question because the internal structure of fortran unformatted sequential files was never standardized. The record length integers were never intended to be seen by normal users, putting the whole topic outside fortran standards. For the compilers and unix and linux platforms within my experience, I can count on the following structure of each unformatted record: [length] [data block] [length] Where [length] is a 4 or 8 byte integer, the byte count of the data block; and [data block] is the user data from a single unformatted write statement. The leading and trailing length integers for each record are identical. I believe the original purpose of the trailing length was to support reverse reading such as the backspace statement. The other important aspect of the file structure is that the unformatted records are then written or mapped contiguously onto an ordinary file, with no gaps between records. The unix-like data model for ordinary files is that they are a single unbroken stream of bytes with a given total length. Certain older platforms used distintly different mappings, so let's just stick with the unix-like assumption from now on. So you should be able to get a fairly robust determination by using the redundant information in the trailing length byte of the first record. Using either direct or stream access, read the first 8 bytes of the file. Then test several interpretations for the leading length integer. Including endian if you need to, there are four possibilities: 4 bytes, little endian 8 bytes, little endian 4 bytes, big endian 8 bytes, big endian For each possibility, you then skip N or N-4 bytes in the file, and attempt to read the first record's trailing length integer in the same format. Test for I/O error each time, because misinterpreted lengths will often run off the end of the file. Depending on your application, you may also be able to pre-screen for minimum and maximum reasonable record lengths, before attempting wild file seeks. If your fortran supports inquiring the file length (F2003 I think), this is also a good prequalification for interpreted lengths. Ideally the tests will yield one success and three failures, which means a complete determination. It is conceivable that a file may have a data pattern that exactly matches the expected trailing length integer. Then you might consider testing the second record as well. But that seems like a lot of work to me. My work in this area so far has been confined to file sets with severe constraints on the minimum and maximum record size, which makes discrimination much easier. --Dave
From: Dave Allured on 6 Aug 2010 13:19 Correction. In the 5th paragraph, replace "byte" with "integer": > redundant information in the trailing length *integer* ... --Dave
From: Nick Maclaren on 6 Aug 2010 13:28 In article <4C5C40E1.2823(a)nospom.com>, Dave Allured <nospom(a)nospom.com> wrote: >Gideon wrote: >> >> So here's my question: is there an easy, robust, way to discover what >> size the header of a fortran unformatted file is on a given >> architecture/OS? > >This is a tricky question because the internal structure of fortran >unformatted sequential files was never standardized. The record length >integers were never intended to be seen by normal users, putting the >whole topic outside fortran standards. That's understating the issue :-) What record-length integers? Some systems didn't have them, and that includes some types of file under Unices :-) Magnetic tapes of types that allow variable-length blocks, run-time systems that allow the direct use of sockets and so on. >For the compilers and unix and linux platforms within my experience, I >can count on the following structure of each unformatted record: > > [length] [data block] [length] > >Where [length] is a 4 or 8 byte integer, the byte count of the data >block; and [data block] is the user data from a single unformatted write >statement. The leading and trailing length integers for each record are >identical. I believe the original purpose of the trailing length was to >support reverse reading such as the backspace statement. That is correct, and that is the usual format. HOWEVER, I have also seen the following: 1) As above, but with 2 byte integers. 2) With only a preceding length (4 byte, if I recall). 3) With a header before the first record. 4) With the [length] field actually being a [junk,length] field. My guess is that all of those are now dead and buried, though. Regards, Nick Maclaren.
From: Dave Allured on 6 Aug 2010 14:04 Nick Maclaren wrote: > > In article <4C5C40E1.2823(a)nospom.com>, Dave Allured <nospom(a)nospom.com> wrote: > >Gideon wrote: > >> > >> So here's my question: is there an easy, robust, way to discover what > >> size the header of a fortran unformatted file is on a given > >> architecture/OS? > > > >This is a tricky question because the internal structure of fortran > >unformatted sequential files was never standardized. The record length > >integers were never intended to be seen by normal users, putting the > >whole topic outside fortran standards. > > That's understating the issue :-) > > What record-length integers? Some systems didn't have them, and > that includes some types of file under Unices :-) Magnetic tapes > of types that allow variable-length blocks, run-time systems that > allow the direct use of sockets and so on. Point taken! However today Gideon and I trying to discuss ordinary disk files only, for practical purposes. If for whatever reason you try my file sexing methods on a wierd I/O device, you will get what you deserve! ;-) > >For the compilers and unix and linux platforms within my experience, I > >can count on the following structure of each unformatted record: > > > > [length] [data block] [length] > > > >Where [length] is a 4 or 8 byte integer, the byte count of the data > >block; and [data block] is the user data from a single unformatted write > >statement. The leading and trailing length integers for each record are > >identical. I believe the original purpose of the trailing length was to > >support reverse reading such as the backspace statement. > > That is correct, and that is the usual format. HOWEVER, I have also > seen the following: > > 1) As above, but with 2 byte integers. > > 2) With only a preceding length (4 byte, if I recall). > > 3) With a header before the first record. > > 4) With the [length] field actually being a [junk,length] field. > > My guess is that all of those are now dead and buried, though. Very interesting! My algorithm could deals with (1) with increase in uncertainty or supplemental testing, and (3) and (4) with more knowledge of the particular details. But the important part is where you said dead and buried! ;-) --Dave
|
Next
|
Last
Pages: 1 2 3 4 Prev: a wiki entry for gfortran Next: BIND(C) functions in a module error |