Reading messy files with Fortran [Fortran]

Prev: reading complex data using implied do loops
Next: FTP libraries

From: robin on 31 Jan 2010 06:35

<analyst41(a)hotmail.com> wrote in message news:a31f5cab-b0b1-4cdf-ab66-ed1432409861(a)g39g2000vba.googlegroups.com...

>I actually don;t know the Windows/DOS command to produce a HEX dump -
>if someone knows it - please post it. I have reduced the problem
>row=set to a few rows - it should be possible to post the entire data
>here as a HEX dump.

The FILEIT program is pretty good in DOS.
Displays in hex and in characters.
Press a key and it displays in characters only.

There are lots of other things that you can do with it.

From: robin on 31 Jan 2010 06:35

<analyst41(a)hotmail.com> wrote in message news:54c5aaba-8ee3-4f17-a398-e70c3e63f284(a)p23g2000vbl.googlegroups.com...
|I posted on this topic before and this is my latest take on it:
|
| (1) In my case the messy files are csv extracts from a database (whose
| character encoding is Unicode - I don't know if it has anything to do
| with the problem).
|
| (2) I discovered that Fortran sees spurious EOR markers within
| character fields and I couldn't see a rhyme or reason why.
|
| (3) But since I control the input - I inserted row numbers at the
| beginning and end of each row extracted from the database and I added
| 2000000000 to the row number make sure its unlikely that this data
| would show up naturally.
|
| (4) I then read each record and make sure that it has at least 18
| characters (if not it is simply concatenated to cum_buffer - see
| below).
|
| I use the statement (adapted from Cooper Redwine's book)
|
| read (unit = nn, fmt = '(A)', advance = 'no', iostat = read_stat, size
| = num_chars) buffer
|
| you must have EOR or EOF or error on each read - otherwise the buffer
| is too small and the program has to be halted.
|
| I then check if the record number is showing up at the end which is
| the same as the one on the left. If yes, you have a complete record -
| if not - you have a spurious EOR and and simply concatenate the buffer
| to another buffer called cum_buffer.
|
| when cum_buffer looks like
|
| 2000000127stuff2000000127
|
| You have a facsimile of a row 127 from the database.
|
| You might still have to struggle with separating 'stuff' into fields -
| but thats a purely programming task having nothing to do with the file
| system or operating system or character encoding schemes.
|
| I hope others find this useful and suggestions for improvements would
| be good.

If you are still having a problem with this --
have you tried reading single characters with direct accress READ?

From: analyst41 on 31 Jan 2010 09:21

On Jan 31, 6:35 am, "robin" <robi...(a)bigpond.com> wrote:
> <analys...(a)hotmail.com> wrote in messagenews:54c5aaba-8ee3-4f17-a398-e70c3e63f284(a)p23g2000vbl.googlegroups.com...
>
> |I posted on this topic before and this is my latest take on it:
> |
> | (1) In my case the messy files are csv extracts from a database (whose
> | character encoding is Unicode - I don't know if it has anything to do
> | with the problem).
> |
> | (2) I discovered that Fortran sees spurious EOR markers within
> | character fields and I couldn't see a rhyme or reason why.
> |
> | (3) But since I control the input - I inserted row numbers at the
> | beginning and end of each row extracted from the database and I added
> | 2000000000 to the row number make sure its unlikely that this data
> | would show up naturally.
> |
> | (4) I then read each record and make sure that it has at least 18
> | characters (if not it is simply concatenated to cum_buffer - see
> | below).
> |
> | I use the statement (adapted from Cooper Redwine's book)
> |
> | read (unit = nn, fmt = '(A)', advance = 'no', iostat = read_stat, size
> | = num_chars) buffer
> |
> | you must have EOR or EOF or error on each read - otherwise the buffer
> | is too small and the program has to be halted.
> |
> | I then check if the record number is showing up at the end which is
> | the same as the one on the left. If yes, you have a complete record -
> | if not - you have a spurious EOR and and simply concatenate the buffer
> | to another buffer called cum_buffer.
> |
> | when cum_buffer looks like
> |
> | 2000000127stuff2000000127
> |
> | You have a facsimile of a row 127 from the database.
> |
> | You might still have to struggle with separating 'stuff' into fields -
> | but thats a purely programming task having nothing to do with the file
> | system or operating system or character encoding schemes.
> |
> | I hope others find this useful and suggestions for improvements would
> | be good.
>
> If you are still having a problem with this --
> have you tried reading single characters with direct accress READ?

I have tried reading one character at a time using A1 format and
assembling the records and it leads to pretty much the same thing.

I am not that familar with direct (unformatted?) read - I assume
Fortran should be able to do the equivalent of a Hex dump - But I
haven't had a chance to figure out how to do that.

From: dpb on 31 Jan 2010 10:22

analyst41(a)hotmail.com wrote:
....

> I have tried reading one character at a time using A1 format and
> assembling the records and it leads to pretty much the same thing.

As expected, I'd think...

> I am not that familar with direct (unformatted?) read - I assume
> Fortran should be able to do the equivalent of a Hex dump - But I
> haven't had a chance to figure out how to do that.

W/ F95 it'll be an extension but virtually all vendors have a way. Look
for "BINARY" or something similar (other than FORMATTED or UNFORMATTED)
for the FORM specifier of the OPEN statement.

UNFORMATTED won't serve the purpose as it will expect record markers
that your database program won't have provided.

After that, simply

CHARACTER (LEN=1) :: char

READ(lfn, [, iostat] [, err] [, end]) char

is the guts of a loop. You can examine each char and decide what to do
w/ it in building a buffer.

Or, if this is a fixed-length record or you could make it such you could
read the record into a buffer and do character substitution on the
offending character(s) and then use internal read from the buffer to get
the data.

--

From: dpb on 31 Jan 2010 10:55

analyst41(a)hotmail.com wrote:
....

> I actually don;t know the Windows/DOS command to produce a HEX dump -
> if someone knows it - please post it. ...

To add another possible utility to the list --

I use the JPSoftware command interpreters <www.jpsoft.com> as great
enhancements to, yet compatible w/ the MS CLIs.

There's a free version available for download amongst the many
enhancements is its LIST which includes hex dump mode.

--

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: reading complex data using implied do loops
Next: FTP libraries