From: Nick Baltas on
Hi all,

I am trying to read a csv file that is full of numerical values, but one single column (the most important for my purposes) has a couple of 'C' characters when the value is missing. And because of that csvread crashes. The file is too long (above 2million lines), to open it with excel and have a simple replace command..

But I believe that it is truly inefficient to be unable to read such a file in Matlab...Any ideas?

Thanks in advance,
Nick

p.s. if I exclude this column when retrieving the data from the database, csvread works smoothly..hence, I am 101% that my problem is located in those non-numerical values.
From: TideMan on
On Feb 9, 6:48 am, "Nick Baltas" <n.bal...(a)imperial.ac.uk> wrote:
> Hi all,
>
> I am trying to read a csv file that is full of numerical values, but one single column (the most important for my purposes) has a couple of 'C' characters when the value is missing. And because of that csvread crashes. The file is too long (above 2million lines), to open it with excel and have a simple replace command..
>
> But I believe that it is truly inefficient to be unable to read such a file in Matlab...Any ideas?
>
> Thanks in advance,
> Nick
>
> p.s. if I exclude this column when retrieving the data from the database, csvread works smoothly..hence, I am 101% that my problem is located in those non-numerical values.

There are lots of ways to get around this, you just need to think
outside the square a bit:
1. Use textscan with comma delimiter, reading the troublesome columns
as strings which you convert once the data are in Matlab;
2. Use textscan with comma delimiter, stopping when it crashes, then
resuming.
3. Read the file in line by line using fgetl, then look for bad data
in particular columns (very slow option).
4. Edit the file with Wordpad or similar.
etc
etc
etc

From: Nick Baltas on
TideMan <mulgor(a)gmail.com> wrote in message <d586fe6b-b221-466a-b97b-96549083d49e(a)q2g2000pre.googlegroups.com>...
> On Feb 9, 6:48 am, "Nick Baltas" <n.bal...(a)imperial.ac.uk> wrote:
> > Hi all,
> >
> > I am trying to read a csv file that is full of numerical values, but one single column (the most important for my purposes) has a couple of 'C' characters when the value is missing. And because of that csvread crashes. The file is too long (above 2million lines), to open it with excel and have a simple replace command..
> >
> > But I believe that it is truly inefficient to be unable to read such a file in Matlab...Any ideas?
> >
> > Thanks in advance,
> > Nick
> >
> > p.s. if I exclude this column when retrieving the data from the database, csvread works smoothly..hence, I am 101% that my problem is located in those non-numerical values.
>
> There are lots of ways to get around this, you just need to think
> outside the square a bit:
> 1. Use textscan with comma delimiter, reading the troublesome columns
> as strings which you convert once the data are in Matlab;
> 2. Use textscan with comma delimiter, stopping when it crashes, then
> resuming.
> 3. Read the file in line by line using fgetl, then look for bad data
> in particular columns (very slow option).
> 4. Edit the file with Wordpad or similar.
> etc
> etc
> etc

Worked like a beauty:


C = textscan(fid, '%d32 %d32 %d8 %d8 %d32 %f32 %f %s','HeaderLines',1,'Delimiter',',';

for an 8-column csv file with more than 2.5 million rows..in just 5 secs a 300MB cell was generated!

Thanks a lot!

p.s. if anyone uses this function, make sure you change the integers into doubles, if you want to do calculations with different types of variables after the execution of textscan...it saves a lot of space if you store your data using d8 or d16 types, that's why I chose those for my big dataset.
 | 
Pages: 1
Prev: beginners matlab
Next: basic sphere volume plot