Prev: beginners matlab
Next: basic sphere volume plot
From: Nick Baltas on 8 Feb 2010 12:48 Hi all, I am trying to read a csv file that is full of numerical values, but one single column (the most important for my purposes) has a couple of 'C' characters when the value is missing. And because of that csvread crashes. The file is too long (above 2million lines), to open it with excel and have a simple replace command.. But I believe that it is truly inefficient to be unable to read such a file in Matlab...Any ideas? Thanks in advance, Nick p.s. if I exclude this column when retrieving the data from the database, csvread works smoothly..hence, I am 101% that my problem is located in those non-numerical values.
From: TideMan on 8 Feb 2010 14:04 On Feb 9, 6:48 am, "Nick Baltas" <n.bal...(a)imperial.ac.uk> wrote: > Hi all, > > I am trying to read a csv file that is full of numerical values, but one single column (the most important for my purposes) has a couple of 'C' characters when the value is missing. And because of that csvread crashes. The file is too long (above 2million lines), to open it with excel and have a simple replace command.. > > But I believe that it is truly inefficient to be unable to read such a file in Matlab...Any ideas? > > Thanks in advance, > Nick > > p.s. if I exclude this column when retrieving the data from the database, csvread works smoothly..hence, I am 101% that my problem is located in those non-numerical values. There are lots of ways to get around this, you just need to think outside the square a bit: 1. Use textscan with comma delimiter, reading the troublesome columns as strings which you convert once the data are in Matlab; 2. Use textscan with comma delimiter, stopping when it crashes, then resuming. 3. Read the file in line by line using fgetl, then look for bad data in particular columns (very slow option). 4. Edit the file with Wordpad or similar. etc etc etc
From: Nick Baltas on 8 Feb 2010 14:45 TideMan <mulgor(a)gmail.com> wrote in message <d586fe6b-b221-466a-b97b-96549083d49e(a)q2g2000pre.googlegroups.com>... > On Feb 9, 6:48 am, "Nick Baltas" <n.bal...(a)imperial.ac.uk> wrote: > > Hi all, > > > > I am trying to read a csv file that is full of numerical values, but one single column (the most important for my purposes) has a couple of 'C' characters when the value is missing. And because of that csvread crashes. The file is too long (above 2million lines), to open it with excel and have a simple replace command.. > > > > But I believe that it is truly inefficient to be unable to read such a file in Matlab...Any ideas? > > > > Thanks in advance, > > Nick > > > > p.s. if I exclude this column when retrieving the data from the database, csvread works smoothly..hence, I am 101% that my problem is located in those non-numerical values. > > There are lots of ways to get around this, you just need to think > outside the square a bit: > 1. Use textscan with comma delimiter, reading the troublesome columns > as strings which you convert once the data are in Matlab; > 2. Use textscan with comma delimiter, stopping when it crashes, then > resuming. > 3. Read the file in line by line using fgetl, then look for bad data > in particular columns (very slow option). > 4. Edit the file with Wordpad or similar. > etc > etc > etc Worked like a beauty: C = textscan(fid, '%d32 %d32 %d8 %d8 %d32 %f32 %f %s','HeaderLines',1,'Delimiter',','; for an 8-column csv file with more than 2.5 million rows..in just 5 secs a 300MB cell was generated! Thanks a lot! p.s. if anyone uses this function, make sure you change the integers into doubles, if you want to do calculations with different types of variables after the execution of textscan...it saves a lot of space if you store your data using d8 or d16 types, that's why I chose those for my big dataset.
|
Pages: 1 Prev: beginners matlab Next: basic sphere volume plot |