Prev: New version of Quantum Mathematica Add-On for Dirac bra ket notation quantum algebra and quantum computing
Next: Mac OS X hard disk space used up by virtual memory
From: Garapata on 15 Jan 2010 03:16 I have dates in the first column of large flat files (6000+ rows). The flat files have a header row and may have as many as 20 columns of data besides the dates. A sample of a flat file follows dataFile = {{"DATES", "DATA1", "DATA2"}, {"12/31/84", 1, 1}, {"01/01/85", 1, 1.00239}, {"01/02/85", 0.999206, 1.00238}, {"01/03/85", 0.997425, 1.00238}, {"01/04/85", 0.997038, 1.00237}, {"01/07/85", 0.989256, 1.0071}, {"01/08/85", 1.00867, 1.00235}, {"01/09/85", 0.994708, 1.00235}, {"01/10/85", 1.00552, 1.00234}} I want to make a list of just the dates from the first column and turn them into an unambiguous DataList[] format for later processing. This works: dates = DateList[{#, {"Month", "Day","YearShort"}}] & /@ (DateString [#] & /@Rest[dataFile][[All, 1]]); but, with thousands of dates in a file to convert, it takes a long time to run. The Map within a Map seems to slow things down a lot. Unless I've missed something, (quite possible) neither DateList[] nor DateString seem to operate directly on lists, so I haven't figured out a better way to do this. Can I do anything to make this run faster? Any solutions much appreciated.
From: Albert Retey on 15 Jan 2010 07:00
Am 15.01.2010 09:16, schrieb Garapata: > I have dates in the first column of large flat files (6000+ rows). > The flat files have a header row and may have as many as 20 columns of > data besides the dates. A sample of a flat file follows > > dataFile = {{"DATES", "DATA1", "DATA2"}, {"12/31/84", 1, 1}, > {"01/01/85", 1, > 1.00239}, {"01/02/85", 0.999206, 1.00238}, {"01/03/85", 0.997425, > 1.00238}, {"01/04/85", 0.997038, 1.00237}, {"01/07/85", 0.989256, > 1.0071}, {"01/08/85", 1.00867, 1.00235}, {"01/09/85", 0.994708, > 1.00235}, {"01/10/85", 1.00552, 1.00234}} > > I want to make a list of just the dates from the first column and turn > them into an unambiguous DataList[] format for later processing. > > This works: > > dates = DateList[{#, {"Month", "Day","YearShort"}}] & /@ (DateString > [#] & /@Rest[dataFile][[All, 1]]); > > but, with thousands of dates in a file to convert, it takes a long > time to run. The Map within a Map seems to slow things down a lot. > Unless I've missed something, (quite possible) neither DateList[] nor > DateString seem to operate directly on lists, so I haven't figured out > a better way to do this. > > Can I do anything to make this run faster? Any solutions much > appreciated. > I think the two Maps are not the reason: they are not necessary and you will find that the code below will do the same thing with only one Map -- and not be much faster (I have created a longer list of just dates for my tests): In[27]:= datelist = Table[DateString[ DatePlus[{1981, 1, 1}, n], {"Month", "/", "Day", "/", "YearShort"}], {n, 0, 1000}]; In[36]:= Timing[ res1 = DateList[{#, {"Month", "Day", "YearShort"}}] & /@ datelist;] Out[36]= {4.563, Null} I think the reason for DateList being rather slow is that it is too much overhead for "simple" and regular cases like this. Also I believe it could well be it makes calls to java functions, which also is not a good idea if you are after speed. The following will only work with dates of exactly this format but be much faster: In[37]:= Timing[ res2 = Apply[ {If[#1 > 10, 1900 + #1, 2000 + #1], ##2} &, ToExpression /@ StringReplace[datelist, RegularExpression["([0-9]*)/([0-9]*)/([0-9]*)"] -> "{$3,$1,$2,0,0,0}"], {1} ]; ] Out[37]= {0.015, Null} In[38]:= res1 == res2 Out[38]= True hth, albert |