Mixed format strings [Matlab]

Prev: Durbin Test
Next: Estimate distribution based on histogram data

From: Baalzamon on 12 Apr 2010 17:25

Ah sorry. I have another program (a) which stores parameters in tabulated form. However the program that does this hasn't forseen that some of the numbers in a field can be larger than the allowed size of the field. (Much like when you have a 8 digit number in a cell only big enough for say 5 digits). As a result it fills the field with stars like in the example shown previuosly.
In other instances lets say the contents of two adjacent cells are 260 and 1000. Then the output should be [200 1000] however if like mentioned before one of the numbers is too large the result becomes [2001000] as such the fields are merged.
In a previous m file i wrote this problem was somewhat solved as any rows that had merged cells also contained stars and these were discarded. Now, after some tinkering with the program (a), these are now 'reliable' data sets. So my previous method is no longer valid.
Below are two examples of lines from my file
.0124 200 900 260 600 .2140 .1586 .3179 80.0000 10.0000 10.0000 .0000 -.0180 .0213 301.0000 .1000 .4100 .3500 .4100 .3500 4.7000 9.4000 100.0000 2.407 .077 811.32 337 100.00 .1181 .0000 100.0000 .0000 V$09Û¤B@ .5562 296.1021 .0000**********ªûßäÆ¯QA

In this case the stars are joined to the .0000 cell.

C@ .5562 296.0874 .0000**********(Ÿ~Ç„¿QA
This is another line which I wish to discard.
For the first example this is a somewhat good set. I would be happy to just extract the cells before the numbers turned into ascii and letters. I'll be happy to show you a more complete text file if this would clarify my need.

Much thanks for the reply btw

From: dpb on 12 Apr 2010 20:11

Baalzamon wrote:
> Ah sorry. I have another program (a) which stores parameters in
> tabulated form. However the program that does this hasn't forseen that
> some of the numbers in a field can be larger than the allowed size of
> the field. (Much like when you have a 8 digit number in a cell only big
> enough for say 5 digits). As a result it fills the field with stars
> like in the example shown previuosly. In other instances lets say the
> contents of two adjacent cells are 260 and 1000. Then the output should
> be [200 1000] however if like mentioned before one of the numbers is
> too large the result becomes [2001000] as such the fields are merged. In
> a previous m file i wrote this problem was somewhat solved as any rows
> that had merged cells also contained stars and these were discarded.
> Now, after some tinkering with the program (a), these are now 'reliable'
> data sets. So my previous method is no longer valid. Below are two
> examples of lines from my file

Well, I'd start w/ the beginning file and use the proper field width
since they are fixed width fields. That solves the problem of valid
data "merged" -- they _aren't_ merged, they just are each filling the field.

But, the case of overflow creates a problem because, I presume, it can
occur in any column on any given record. Therefore, I think you'll have
to first use fgetl() to read a line, find whether there is or isn't an
asterisk and parse each line based on that.

But, if it were at all possible, I'd fix (or cause to be fixed) the
original program to quit making useless data sets.

--

From: Baalzamon on 13 Apr 2010 14:33

Ah thanks again people.

Mission solved. I used fgetl, again, but with some different conditional tests.
Mechanically found when the ascii came in by using length and then made sure all rows were greater than this (gets rid of terminated fits) and made sure that just before this the substring was 0000.
Then to get rid of rows where the fits had doubled I made sure the rows were smaller than a max value i chose (slightly longer than good fits).
Then to avoid str2num from crashing filtered out for stars in specific places by using strcmp.

On some small points...once the code worked some modifications were made.
Intial file 42Mb, 150 000 rowsand Cleaned matrix 33000 rows
1:) Initially read line by line, convert sub string into number array and place certain cell contents in the correct places I wanted. No pre allocation of array.

Time taken 150s

2:) Now open file and count rows. Close and make array of zeros m by n. Re open file and follow steps above but into pre allocated matrix. Then used logical indexing to extract submatrix of rows with populated entries
q=output(output(:,1)>1,:);
time taken 35s

First | Prev |
Pages: 1 2
Prev: Durbin Test
Next: Estimate distribution based on histogram data