Textscan with non delimited text [Matlab]

Prev: What happened to the FRF function?
Next: Time-Frequency Analysis

From: Elizabeth on 10 Aug 2010 12:27

I am trying to import large datafiles in the DSI-3200 format (http://www1.ncdc.noaa.gov/pub/data/documentlibrary/tddoc/td3200.pdf), but there are not delimited:

DLY43124399SNOWTI19030299990280199-99999M00299-99999M00399-99999M00499-99999M00599-99999M00699-99999M00799-99999M00899-99999M00999-99999M01099-99999M01199-99999M01299-99999M01399-99999M01499-99999M01599-99999M01699-99999M0179900001001189900000001199900000001209900000001219900000001229900000001239900000001249900000001259900000001269900000001279900000001289900000001
DLY43124399PRCPHI1903029999011179900006201189900000001199900000001209900000001219900000001229900000001239900000001249900000001259900000001269900000001279900000001
DLY43124399SNWD0I1903029999011189900002901199900002901209900002901219900002801229900002801239900002701249900002501259900002401269900002401279900002201289900001701

Each DLY marks a new row, and rows are of variable length. This is what I have been using to import:

fid=fopen('filename.txt');

output=textscan(fid,['%*3s %6n %*2n %4s %2c %4n %2n %*4n %*3n',repmat('%2n %*2n %6d %1c %1c',[1,62])],'EmptyValue',-99999);

fclose(fid);

....The sequence in repmat is repeated 62 times because there may be up to two daily records per day, maximum 31 days per month. However, matlab does not recognize when it has reach the end of a row, so I only get one row of output. Instead of filling the blank spaces with -99999, i get [].

Any hint are appreciated.

From: Sean on 10 Aug 2010 13:11

"Elizabeth " <ean2(a)unh.edu> wrote in message <i3rula$irj$1(a)fred.mathworks.com>...
> I am trying to import large datafiles in the DSI-3200 format (http://www1.ncdc.noaa.gov/pub/data/documentlibrary/tddoc/td3200.pdf), but there are not delimited:
>
> DLY43124399SNOWTI19030299990280199-99999M00299-99999M00399-99999M00499-99999M00599-99999M00699-99999M00799-99999M00899-99999M00999-99999M01099-99999M01199-99999M01299-99999M01399-99999M01499-99999M01599-99999M01699-99999M0179900001001189900000001199900000001209900000001219900000001229900000001239900000001249900000001259900000001269900000001279900000001289900000001
> DLY43124399PRCPHI1903029999011179900006201189900000001199900000001209900000001219900000001229900000001239900000001249900000001259900000001269900000001279900000001
> DLY43124399SNWD0I1903029999011189900002901199900002901209900002901219900002801229900002801239900002701249900002501259900002401269900002401279900002201289900001701
>
> Each DLY marks a new row, and rows are of variable length. This is what I have been using to import:
>
> fid=fopen('filename.txt');
>
> output=textscan(fid,['%*3s %6n %*2n %4s %2c %4n %2n %*4n %*3n',repmat('%2n %*2n %6d %1c %1c',[1,62])],'EmptyValue',-99999);
>
> fclose(fid);
>
> ...The sequence in repmat is repeated 62 times because there may be up to two daily records per day, maximum 31 days per month. However, matlab does not recognize when it has reach the end of a row, so I only get one row of output. Instead of filling the blank spaces with -99999, i get [].
>
> Any hint are appreciated.

So you want to isolate all of the text after 'DLY' before the next occurrence?

%%%
%Load Text by the rows in file
fid = fopen('dly.txt');
T = textscan(fid,'%s');
fclose(fid);

%Combine in to one long string
T = cell2mat(T{1}');

%Split
rows = regexp(T,'DLY','split')

%%%My DLY test case was dly.txt:
DLY8038938
DLY09080DLY983r4938539058830DLY3850
2DLY29

From: Elizabeth on 10 Aug 2010 13:31

No, I'm trying to split it into the different elements outlined in the pdf I linked in my 1st message. So the first row should be divided up into:

DLY 431243 99 SNOW TI 1903 02 9999 028 01 99 -99999 M 0 02 99 -99999 M....etc

It does fine with the first row up until there are no more values. But then it does not start on the next row.

I've given up and am delimiting the files in excel and importing using csv. It's just taking an eternity because i have to manually click on the line breaks.

Liz.

"Sean " <sean.dewolski(a)nospamplease.umit.maine.edu> wrote in message <i3s17a$5qg$1(a)fred.mathworks.com>...
> "Elizabeth " <ean2(a)unh.edu> wrote in message <i3rula$irj$1(a)fred.mathworks.com>...
> > I am trying to import large datafiles in the DSI-3200 format (http://www1.ncdc.noaa.gov/pub/data/documentlibrary/tddoc/td3200.pdf), but there are not delimited:
> >
> > DLY43124399SNOWTI19030299990280199-99999M00299-99999M00399-99999M00499-99999M00599-99999M00699-99999M00799-99999M00899-99999M00999-99999M01099-99999M01199-99999M01299-99999M01399-99999M01499-99999M01599-99999M01699-99999M0179900001001189900000001199900000001209900000001219900000001229900000001239900000001249900000001259900000001269900000001279900000001289900000001
> > DLY43124399PRCPHI1903029999011179900006201189900000001199900000001209900000001219900000001229900000001239900000001249900000001259900000001269900000001279900000001
> > DLY43124399SNWD0I1903029999011189900002901199900002901209900002901219900002801229900002801239900002701249900002501259900002401269900002401279900002201289900001701
> >
> > Each DLY marks a new row, and rows are of variable length. This is what I have been using to import:
> >
> > fid=fopen('filename.txt');
> >
> > output=textscan(fid,['%*3s %6n %*2n %4s %2c %4n %2n %*4n %*3n',repmat('%2n %*2n %6d %1c %1c',[1,62])],'EmptyValue',-99999);
> >
> > fclose(fid);
> >
> > ...The sequence in repmat is repeated 62 times because there may be up to two daily records per day, maximum 31 days per month. However, matlab does not recognize when it has reach the end of a row, so I only get one row of output. Instead of filling the blank spaces with -99999, i get [].
> >
> > Any hint are appreciated.
>
> So you want to isolate all of the text after 'DLY' before the next occurrence?
>
> %%%
> %Load Text by the rows in file
> fid = fopen('dly.txt');
> T = textscan(fid,'%s');
> fclose(fid);
>
> %Combine in to one long string
> T = cell2mat(T{1}');
>
> %Split
> rows = regexp(T,'DLY','split')
>
> %%%My DLY test case was dly.txt:
> DLY8038938
> DLY09080DLY983r4938539058830DLY3850
> 2DLY29

|
Pages: 1
Prev: What happened to the FRF function?
Next: Time-Frequency Analysis