From: robin on 31 Jan 2010 18:26 <analyst41(a)hotmail.com> wrote in message news:9c95f818-d2df-493b-a40f-2977f8fe6a0f(a)z26g2000yqm.googlegroups.com... On Jan 31, 6:35 am, "robin" <robi...(a)bigpond.com> wrote: |> If you are still having a problem with this -- |> have you tried reading single characters with direct accress READ? |I have tried reading one character at a time using A1 format and |assembling the records and it leads to pretty much the same thing. That would be right. |I am not that familar with direct (unformatted?) read - Then give it a try. | I assume |Fortran should be able to do the equivalent of a Hex dump - But I |haven't had a chance to figure out how to do that.
From: dpb on 1 Feb 2010 13:40 dpb wrote: .... > I use the JPSoftware command interpreters <www.jpsoft.com> as great > enhancements to, yet compatible w/ the MS CLIs. > > There's a free version available ... Specifically, <http://www.jpsoft.com/tccledes.htm> --
From: analyst41 on 1 Feb 2010 17:59 On Jan 29, 9:15 pm, "analys...(a)hotmail.com" <analys...(a)hotmail.com> wrote: > On Jan 29, 9:44 am, dpb <n...(a)non.net> wrote: > > > > > > > analys...(a)hotmail.com wrote: > > > On Jan 28, 3:15 am, Arjen Markus <arjen.markus...(a)gmail.com> wrote: > > >> On 28 jan, 00:51, "analys...(a)hotmail.com" <analys...(a)hotmail.com> > > >> wrote: > > > >>> I posted on this topic before and this is my latest take on it: > > >>> (1) In my case the messy files are csv extracts from a database (whose > > >>> character encoding is Unicode - I don't know if it has anything to do > > >>> with the problem). > > >>> (2) I discovered that Fortran sees spurious EOR markers within > > >>> character fields and I couldn't see a rhyme or reason why. > > >>> (3) But since I control the input - I inserted row numbers at the > > >>> beginning and end of each row extracted from the database and I added > > >>> 2000000000 to the row number make sure its unlikely that this data > > >>> would show up naturally. > > >>> (4) I then read each record and make sure that it has at least 18 > > >>> characters (if not it is simply concatenated to cum_buffer - see > > >>> below). > > >>> I use the statement (adapted from Cooper Redwine's book) > > >>> read (unit = nn, fmt = '(A)', advance = 'no', iostat = read_stat, size > > >>> = num_chars) buffer > > >>> you must have EOR or EOF or error on each read - otherwise the buffer > > >>> is too small and the program has to be halted. > > >>> I then check if the record number is showing up at the end which is > > >>> the same as the one on the left. If yes, you have a complete record - > > >>> if not - you have a spurious EOR and and simply concatenate the buffer > > >>> to another buffer called cum_buffer. > > >>> when cum_buffer looks like > > >>> 2000000127stuff2000000127 > > >>> You have a facsimile of a row 127 from the database. > > >>> You might still have to struggle with separating 'stuff' into fields - > > >>> but thats a purely programming task having nothing to do with the file > > >>> system or operating system or character encoding schemes. > > >>> I hope others find this useful and suggestions for improvements would > > >>> be good. > > >> I do not remember your previous postings, but I am curious about these > > >> end-of-records. Can you send me an example? (I want to look at CSV > > >> files > > >> more closely, as I recently was confronted with some of their nastier > > >> aspects > > >> in the context of my Flibs project -http://flibs.sf.net). > > > >> Regards, > > > >> Arjen- Hide quoted text - > > > >> - Show quoted text - > > > > I'd love to given you actual files that show fake EORs - but it is > > > copyright/proprietary data and I din't have the time to clean it up > > > from that stand point. > > > > But here are three cases( the occurrence of these strings causes > > > Fortran to see a fake EOR - LF95 running on windows): > > > > <br /> > > > > </STRONG> > > > > </B> > > > > These seem to be terminators of HTML phrases - I don't know why > > > Fortran thinks these are EORs. Excel would trip up similarly as would > > > the language R - in fact, Fortran, R and Excel may see a different > > > number of rows in the same csv file. > > > Can you post a short section of the file surrounding the offending > > characters as seen by a hex dump program so can see what's actually in > > the data stream? > > > Do these strings fail when read on their own in any length record or > > only in the generated output file from the database? > > > If you can make it fail repeatedly it should be quite simple to at least > > figure out what is the root cause and whether that is a data problem or > > a bug in the particular compiler i/o library. > > > Which raises a point of what happens w/ another compiler? > > > --- Hide quoted text - > > > - Show quoted text - > > I can tell you that its not a Fortran issue. Notepad, Excel and the R > language are unable to split the file up into records so that the > records correspond to rows in the database. > > I actually don;t know the Windows/DOS command to produce a HEX dump - > if someone knows it - please post it. I have reduced the problem > row=set to a few rows - it should be possible to post the entire data > here as a HEX dump.- Hide quoted text - > > - Show quoted text - I pulled from the database row 1('description') and row2 (the content). But notepad, excel and Fortran think the file has 5 rows: description "<strong>Unknown Anytime, Anywhere Learning<br /> </strong> The answer is Unknown. <strong> you can start and finish in less then 17 months.</strong> <br /> <br /> Unknown about ensuring you learn ." 0B24:0100 EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A ...description.. 0B24:0110 22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E "<strong>Unknown 0B24:0120 20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65 Anytime, Anywhe 0B24:0130 72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F re Learning<br / 0B24:0140 3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65 >..</ strong> The 0B24:0150 20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6F answer is Unkno 0B24:0160 77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75 wn. <strong> you 0B24:0170 20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66 can start and f -d 0B24:0180 69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68 inish in less th 0B24:0190 65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73 en 17 months.</s 0B24:01A0 74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C trong> <br />..< 0B24:01B0 62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61 br / >..Unknown a 0B24:01C0 62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F bout ensuring yo 0B24:01D0 75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A u learn ."....&. 0B24:01E0 0F 32 ED 0B C9 74 0F 43-53 26 8B 1F E8 5D 00 5B . 2...t.CS&...].[ 0B24:01F0 73 0B 43 43 E2 F2 2E C7-06 96 90 04 00 5D 5F 5B s.CC.........]_[
From: dpb on 1 Feb 2010 19:40 analyst41(a)hotmail.com wrote: .... > I pulled from the database row 1('description') and row2 (the > content). > > But notepad, excel and Fortran think the file has 5 rows: > > description > "<strong>Unknown Anytime, Anywhere Learning<br /> > </strong> The answer is Unknown. <strong> you can start and finish in > less then 17 months.</strong> <br /> > <br /> > Unknown about ensuring you learn ." > > > EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A ...description.. > 22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E "<strong>Unknown > 20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65 Anytime, Anywhe > 72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F re Learning<br / > 3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65 >..</strong> The > 20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6F answer is Unkno > 77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75 wn. <strong> you > 20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66 can start and f > 69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68 inish in less th > 65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73 en 17 months.</s > 74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C trong> <br />..< > 62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61 br />..Unknown a > 62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F bout ensuring yo > 75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A u learn ."....&. .... Well, it does. Search and count the 0D 0A pairs (CRLF) -- they're the record markers. --
From: analyst41 on 1 Feb 2010 20:01
On Feb 1, 7:40 pm, dpb <n...(a)non.net> wrote: > analys...(a)hotmail.com wrote: > > ... > > > > > > > I pulled from the database row 1('description') and row2 (the > > content). > > > But notepad, excel and Fortran think the file has 5 rows: > > > description > > "<strong>Unknown Anytime, Anywhere Learning<br /> > > </strong> The answer is Unknown. <strong> you can start and finish in > > less then 17 months.</strong> <br /> > > <br /> > > Unknown about ensuring you learn ." > > > EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A ...description.. > > 22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E "<strong>Unknown > > 20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65 Anytime, Anywhe > > 72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F re Learning<br / > > 3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65 >..</strong> The > > 20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6F answer is Unkno > > 77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75 wn. <strong> you > > 20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66 can start and f > > 69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68 inish in less th > > 65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73 en 17 months.</s > > 74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C trong> <br />..< > > 62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61 br />..Unknown a > > 62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F bout ensuring yo > > 75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A u learn ."....&. > > ... > > Well, it does. > > Search and count the 0D 0A pairs (CRLF) -- they're the record markers. > > --- Hide quoted text - > > - Show quoted text - Thanks. But since these markers are occurring both in the middle of a field and also at the end of an actual row from the database - I am still not able to separate out true EORs from the others. |