From: dpb on
Peter Riddersholm wrote:
> dpb <none(a)non.net> wrote in message
> <i0vaec$6jl$1(a)news.eternal-september.org>...
>> Peter Riddersholm wrote:
>> > dpb <none(a)non.net> wrote in message >
>> <i0v87i$tfc$1(a)news.eternal-september.org>...
>> ...
>>
>> >> In general, if is fixed width you can set up a format string to
>> parse >> it; if not you'll have to do something like read each line
>> into a >> string variable a la fgetl() and parse each line deciding on
>> >> what/where the missing values are/go...
>> >>
>> ...
>>
>> > It is fixed width every line.
>> > > What would a format string look like? And then I'll have to use >
>> textscan, right?
>>
>> That would depend on what the field width/description is.
>>
>> Something like '%8n %6n %5.2f...' (just made up; I can't read the
>> listing well enough to try to fool with)
>>
>> textscan, fscanf, perhaps textread even, yes...
>>
>> --
>
> Thanks a lot.
>
> And what about the header, then?
>
> The files are 47615 rows long...

The 'headerlines' option still works; you just have to define the format
of the line -- and again, the _format_ has to be consistent.

I'm in an older release that doesn't have textscan() and didn't build a
file to test textread() but discovered at least on this version sscanf()
has a problem w/ the width specifier. You'll have to just work on it w/
a data file to see what does/doesn't function correctly. (One of my pet
peeves is the sorry way C-formatting strings work in relation to Fortran
FORMATting :) )

As an example of what I ran into...

s = ' 3 12.5'; % three 3-column fields 3,1,2.5
>> s
s =
3 12.5

>> [a,c,err,idx] = sscanf(s,'%3f')
a =
''
c =
0
err =
Matching failure in format.
idx =
3
>> [a,c,err,idx] = sscanf(s,'%3d%3d%3.1f')
a =
''
c =
0
err =
Matching failure in format.
idx =
3
>> sscanf('%3d%3d%3.1f', s)
ans =
''
>> for idx=1:3:length(s), s(idx:idx+2),end
ans =
3
ans =
1
ans =
2.5

The upshot is that the substrings parsed; the full string with the field
width specified didn't. Why, I don't know; seems a bug to me.

PEEVE/ Why there isn't something as simple as (2I3,F3.1) beggers the
imagination...but it seems beyond C's ability /PEEVE

--
From: Peter Riddersholm on
dpb <none(a)non.net> wrote in message <i0vd0l$gp0$1(a)news.eternal-september.org>...
> Peter Riddersholm wrote:
> > dpb <none(a)non.net> wrote in message
> > <i0vaec$6jl$1(a)news.eternal-september.org>...
> >> Peter Riddersholm wrote:
> >> > dpb <none(a)non.net> wrote in message >
> >> <i0v87i$tfc$1(a)news.eternal-september.org>...
> >> ...
> >>
> >> >> In general, if is fixed width you can set up a format string to
> >> parse >> it; if not you'll have to do something like read each line
> >> into a >> string variable a la fgetl() and parse each line deciding on
> >> >> what/where the missing values are/go...
> >> >>
> >> ...
> >>
> >> > It is fixed width every line.
> >> > > What would a format string look like? And then I'll have to use >
> >> textscan, right?
> >>
> >> That would depend on what the field width/description is.
> >>
> >> Something like '%8n %6n %5.2f...' (just made up; I can't read the
> >> listing well enough to try to fool with)
> >>
> >> textscan, fscanf, perhaps textread even, yes...
> >>
> >> --
> >
> > Thanks a lot.
> >
> > And what about the header, then?
> >
> > The files are 47615 rows long...
>
> The 'headerlines' option still works; you just have to define the format
> of the line -- and again, the _format_ has to be consistent.
>
> I'm in an older release that doesn't have textscan() and didn't build a
> file to test textread() but discovered at least on this version sscanf()
> has a problem w/ the width specifier. You'll have to just work on it w/
> a data file to see what does/doesn't function correctly. (One of my pet
> peeves is the sorry way C-formatting strings work in relation to Fortran
> FORMATting :) )
>
> As an example of what I ran into...
>
> s = ' 3 12.5'; % three 3-column fields 3,1,2.5
> >> s
> s =
> 3 12.5
>
> >> [a,c,err,idx] = sscanf(s,'%3f')
> a =
> ''
> c =
> 0
> err =
> Matching failure in format.
> idx =
> 3
> >> [a,c,err,idx] = sscanf(s,'%3d%3d%3.1f')
> a =
> ''
> c =
> 0
> err =
> Matching failure in format.
> idx =
> 3
> >> sscanf('%3d%3d%3.1f', s)
> ans =
> ''
> >> for idx=1:3:length(s), s(idx:idx+2),end
> ans =
> 3
> ans =
> 1
> ans =
> 2.5
>
> The upshot is that the substrings parsed; the full string with the field
> width specified didn't. Why, I don't know; seems a bug to me.
>
> PEEVE/ Why there isn't something as simple as (2I3,F3.1) beggers the
> imagination...but it seems beyond C's ability /PEEVE
>
> --

Thanks a lot for your help. I will try this in the morning!

/Peter
From: dpb on
Peter Riddersholm wrote:
....

> Thanks a lot for your help. I will try this in the morning!
....

You're screwed... :(

From the current documentation for textscan...

"Field Length

You can specify the number of characters or digits to read by inserting
a number between the percent character (%) and the format specifier. ...
....
%N.Dn
%N.Df...

Read N digits (counting a decimal point as a digit), or up to the first
delimiter, _whichever_comes_first_ (emphasis mine)."

There's the killer; and that's why my example sscanf() barfed that
wasn't documented so clearly in my early version as being intended
(albeit misguided imo) behavior. I suppose this also mimics what C does
altho I don't know enough C to know for sure.

It appears you have one of two choices; which to choose would depend on
the file itself and needs I think...

1) Pre-process the file and insert field delimiters such as "," that
will mark the fields and the 'delimiter' option can then be used to
handle them, or

2) Read the file via fgetl() and process individual fields based on
substrings as in my last for...end loop above (altho note the case I
ended up pasting was simply echoing the substring but the case of
sscanf(s(idx:idx+2),'%f') that I missed getting in the posting did work.

Ideally you could get whatever that is generating the data to
incorporate the field delimiters for you.

If there's another route in ML it would be good to know. It's a case
personally where if were a bunch of it I might either write a mex file
in Fortran and have it parse the file or simply write a Fortran utility
to read the file and write it back out in a form for load() inside Matlab.

I've been tempted to see about writing a function that would replace
much of the functionality of textread() but understand Fortran format
strings but haven't ever gotten the necessary round tuit. :)

--

From: dpb on
Peter Riddersholm wrote:
> dpb <none(a)non.net> wrote in message
> <i0vd0l$gp0$1(a)news.eternal-september.org>...
....

>> As an example of what I ran into...
>>
>> s = ' 3 12.5'; % three 3-column fields 3,1,2.5
>> >> [a,c,err,idx] = sscanf(s,'%3f')
>> a =
>> ''
>> c =
>> 0
>> err =
>> Matching failure in format.
>> idx =
>> 3
>> >> sscanf('%3d%3d%3.1f', s)
....

One last idea/note...

>> strread(s,'%3n',3,'delimiter','\n','whitespace','')
ans =
3
12
5
>> strread(s,'%3n',3,'delimiter','\n')
ans =
3
12
5
>> strread(s,'%3n',3,'delimiter','\n','whitespace','\t')
ans =
3
12
5
>> [x,y,z]=strread(s,'%3d%3d%3.1f',1,'delimiter','\n','whitespace','\t')
x =
3
y =
12
z =
0.5000
>> [x,y,z]=strread(s,'%3d%3d%3.1f',1,'delimiter','\n','whitespace','')
x =
3
y =
12
z =
0.5000
>>

Thought maybe if told it to ignore blanks as whitespace it might
work...made some improvement but still not what one would expect. :(

How it interprets the width field in conjunction w/ 'whitespace' is
beyond my ken. Unfortunately, the backend is implemented in a mex file
so can't see what it does nor modify it to have desired behavior
(probably not desirable unless were to make a local copy for
compatibility issues, of course).

Fortran wins hands down in this area...

--
From: dpb on
And even more discouraging...

>> strrep(s,' ','0')
ans =
0030012.5
>> [x,y,z]=strread(s,'%3d%3d%3.1f',1)
x =
3
y =
12
z =
0.5000
>>

It pays no attention to the width specifier and parses to the '.' for
the second value, apparently.

--