From: Garapata on
I have dates in the first column of large flat files (6000+ rows).
The flat files have a header row and may have as many as 20 columns of
data besides the dates. A sample of a flat file follows

dataFile = {{"DATES", "DATA1", "DATA2"}, {"12/31/84", 1, 1},
{"01/01/85", 1,
1.00239}, {"01/02/85", 0.999206, 1.00238}, {"01/03/85", 0.997425,
1.00238}, {"01/04/85", 0.997038, 1.00237}, {"01/07/85", 0.989256,
1.0071}, {"01/08/85", 1.00867, 1.00235}, {"01/09/85", 0.994708,
1.00235}, {"01/10/85", 1.00552, 1.00234}}

I want to make a list of just the dates from the first column and turn
them into an unambiguous DataList[] format for later processing.

This works:

dates = DateList[{#, {"Month", "Day","YearShort"}}] & /@ (DateString
[#] & /@Rest[dataFile][[All, 1]]);

but, with thousands of dates in a file to convert, it takes a long
time to run. The Map within a Map seems to slow things down a lot.
Unless I've missed something, (quite possible) neither DateList[] nor
DateString seem to operate directly on lists, so I haven't figured out
a better way to do this.

Can I do anything to make this run faster? Any solutions much
appreciated.

From: Albert Retey on
Am 15.01.2010 09:16, schrieb Garapata:
> I have dates in the first column of large flat files (6000+ rows).
> The flat files have a header row and may have as many as 20 columns of
> data besides the dates. A sample of a flat file follows
>
> dataFile = {{"DATES", "DATA1", "DATA2"}, {"12/31/84", 1, 1},
> {"01/01/85", 1,
> 1.00239}, {"01/02/85", 0.999206, 1.00238}, {"01/03/85", 0.997425,
> 1.00238}, {"01/04/85", 0.997038, 1.00237}, {"01/07/85", 0.989256,
> 1.0071}, {"01/08/85", 1.00867, 1.00235}, {"01/09/85", 0.994708,
> 1.00235}, {"01/10/85", 1.00552, 1.00234}}
>
> I want to make a list of just the dates from the first column and turn
> them into an unambiguous DataList[] format for later processing.
>
> This works:
>
> dates = DateList[{#, {"Month", "Day","YearShort"}}] & /@ (DateString
> [#] & /@Rest[dataFile][[All, 1]]);
>
> but, with thousands of dates in a file to convert, it takes a long
> time to run. The Map within a Map seems to slow things down a lot.
> Unless I've missed something, (quite possible) neither DateList[] nor
> DateString seem to operate directly on lists, so I haven't figured out
> a better way to do this.
>
> Can I do anything to make this run faster? Any solutions much
> appreciated.
>

I think the two Maps are not the reason: they are not necessary and you
will find that the code below will do the same thing with only one Map
-- and not be much faster (I have created a longer list of just dates
for my tests):

In[27]:= datelist = Table[DateString[
DatePlus[{1981, 1, 1}, n], {"Month", "/", "Day", "/",
"YearShort"}], {n, 0, 1000}];

In[36]:= Timing[
res1 = DateList[{#, {"Month", "Day", "YearShort"}}] & /@ datelist;]

Out[36]= {4.563, Null}

I think the reason for DateList being rather slow is that it is too much
overhead for "simple" and regular cases like this. Also I believe it
could well be it makes calls to java functions, which also is not a good
idea if you are after speed. The following will only work with dates of
exactly this format but be much faster:

In[37]:= Timing[
res2 = Apply[
{If[#1 > 10, 1900 + #1, 2000 + #1], ##2} &,
ToExpression /@
StringReplace[datelist,
RegularExpression["([0-9]*)/([0-9]*)/([0-9]*)"] ->
"{$3,$1,$2,0,0,0}"],
{1}
];
]

Out[37]= {0.015, Null}

In[38]:= res1 == res2

Out[38]= True

hth,

albert