From: Mike on
I have data files that are:

34 columns
a heck of a lot of rows!

I need to read in the data (skipping the first 14 lines of header) to an array.

I do have a way to do this, but I have found that it slows down on my large files. What I've done and it works, is:

temp = textread(filename, '%f', 'headerlines', numOfHeaderLines)
--
for i = 1:(size(temp)/34)
for j = 1:34
dat(i,j) = temp(34*(i-1)+j);

end
end

--
This reads in all the data to a 1D array of an enourmous length and then converts it to the 34 column 2D array...

I have tried to eliminate having to do this by doing things along the lines of:
--
[dat(:,1),dat(:,2).....dat(:,33),dat(:,34)] = textread(filename,'%f/t%f/t......%f/t%f','headerlines',numOfHeaderLines)
--
I have done this and some variations of and nothing works, I get errors usually relating to 'A(I) = B, I has to have same dimensions of B'... Or 'number of outputs must match the number of unskipped inputs...'

Does anyone have a solution or suggestion to this? I would like to think there is something a little faster than the way that works currently...

Thank you,
From: us on
"Mike " <mikejcunningham(a)gmail.com> wrote in message <i02lcp$fme$1(a)fred.mathworks.com>...
> I have data files that are:
>
> 34 columns
> a heck of a lot of rows!
>
> I need to read in the data (skipping the first 14 lines of header) to an array.
>
> I do have a way to do this, but I have found that it slows down on my large files. What I've done and it works, is:
>
> temp = textread(filename, '%f', 'headerlines', numOfHeaderLines)
> --
> for i = 1:(size(temp)/34)
> for j = 1:34
> dat(i,j) = temp(34*(i-1)+j);
>
> end
> end
>
> --
> This reads in all the data to a 1D array of an enourmous length and then converts it to the 34 column 2D array...
>
> I have tried to eliminate having to do this by doing things along the lines of:
> --
> [dat(:,1),dat(:,2).....dat(:,33),dat(:,34)] = textread(filename,'%f/t%f/t......%f/t%f','headerlines',numOfHeaderLines)
> --
> I have done this and some variations of and nothing works, I get errors usually relating to 'A(I) = B, I has to have same dimensions of B'... Or 'number of outputs must match the number of unskipped inputs...'
>
> Does anyone have a solution or suggestion to this? I would like to think there is something a little faster than the way that works currently...
>
> Thank you,

show a small portion of the anatomy of your file's content...

us
From: Mike on
file content is:
--
This is the anatomy
of my file
this header goes
on for 14 lines in the file
Here is the data:
1 1 33456 28864 31408 39040 32544 35232 32480 25184 30880 30416 34032 40688 29504 32880 36800 26048 41056 24384 30176 36736 32208 37648 36512 24864 36064 22992 25600 31808 27824 38576 25648 24240
1 2 32224 31152 36432 34304 30656 36144 31856 25904 35328 26032 38384 41440 29072 33440 36800 28976 36288 18896 20400 36960 29184 35776 30528 27424 35152 21120 27056 29984 24256 38544 23536 26288
1 3 36032 27968 33664 34576 29088 34704 32896 26080 32704 28928 36080 41872 30880 32928 37712 30224 37152 20000 28336 36416 34640 39632 32304 22000 37680 21584 22592 31200 26960 37824 27616 26464
--

This is 3 rows of data, where it says 1 1 is the first row, 1 2, is the 2nd row, 1 3, is the 3rd row... etc.

Hope this helps

"us " <us(a)neurol.unizh.ch> wrote in message <i02mi8$2qh$1(a)fred.mathworks.com>...
> "Mike " <mikejcunningham(a)gmail.com> wrote in message <i02lcp$fme$1(a)fred.mathworks.com>...
> > I have data files that are:
> >
> > 34 columns
> > a heck of a lot of rows!
> >
> > I need to read in the data (skipping the first 14 lines of header) to an array.
> >
> > I do have a way to do this, but I have found that it slows down on my large files. What I've done and it works, is:
> >
> > temp = textread(filename, '%f', 'headerlines', numOfHeaderLines)
> > --
> > for i = 1:(size(temp)/34)
> > for j = 1:34
> > dat(i,j) = temp(34*(i-1)+j);
> >
> > end
> > end
> >
> > --
> > This reads in all the data to a 1D array of an enourmous length and then converts it to the 34 column 2D array...
> >
> > I have tried to eliminate having to do this by doing things along the lines of:
> > --
> > [dat(:,1),dat(:,2).....dat(:,33),dat(:,34)] = textread(filename,'%f/t%f/t......%f/t%f','headerlines',numOfHeaderLines)
> > --
> > I have done this and some variations of and nothing works, I get errors usually relating to 'A(I) = B, I has to have same dimensions of B'... Or 'number of outputs must match the number of unskipped inputs...'
> >
> > Does anyone have a solution or suggestion to this? I would like to think there is something a little faster than the way that works currently...
> >
> > Thank you,
>
> show a small portion of the anatomy of your file's content...
>
> us
From: dpb on
Mike wrote:
....
> 34 columns a heck of a lot of rows!
....
> I do have a way to do this, but I have found that it slows down on my
> large files. What I've done and it works, is:
>
> temp = textread(filename, '%f', 'headerlines', numOfHeaderLines)
> --
> for i = 1:(size(temp)/34)
> for j = 1:34
> dat(i,j) = temp(34*(i-1)+j);
> end
> end
>
> --
> This reads in all the data to a 1D array of an enourmous length and then
> converts it to the 34 column 2D array...
....

dat = reshape(x,34,length(x)/34)';

Reshape to number of rows as columns (remember ML stores column order)
and transpose to the expected form...

_Should_ be faster.

I was thinking that textread() allowed the [N,inf] form to tell it on
input the shape but I see at least my version doesn't. Might check on
textscan() which postdates my release...

--
From: dpb on
Mike wrote:
> I have data files that are:
>
> 34 columns a heck of a lot of rows!
>
> I need to read in the data (skipping the first 14 lines of header) to an
> array.
....

> temp = textread(filename, '%f', 'headerlines', numOfHeaderLines)

....

Another way to compare if you need to do this more than just once...

nCols = 34;
nHdr = 14;
fid=fopen(filename,'rt');
for idx = 1:nHdr
hl=fgetl(fid); % get the headerlines; more convenient in textread
end
x=fscanf(fid,'%f',[nCols,inf])'; Nota Bene the "'" transpose
fid=fclose(fid);

Not sure which (if either) might have any speed advantage...

I do believe looking at online doc's textscan() w/ the 'collectoutput'
option could possibly return a cell array in proper order w/ the
convenience of 'headerlines' but at the expense of the format string
being something like

repmat('%f ',1,nCols) to give it the clue of number of elements/row.

Untested...

nCols = 34;
nHdr = 14;
fid = fopen(filename, 'rt');
dat = textscan(fid, repmat('%f ',1,nCols), ...
'HeaderLines', nHdr, ...
'CollectOutput', 1);
fid=fclose(fid);

No clue how close that really is; don't have textscan() to play with...

--


--
 | 
Pages: 1
Prev: Initial Value do not match
Next: Simhydraulics (