From: Adam Thibideau on 21 Jul 2010 20:39 Hi there, I have the following code to count how many points in a time series i will have when I average my input data. My problem is that my code takes way too long. Any suggestions on how to speed things up would be great because my input is about 60 million rows of csv data. Thanks in advance! function x = textscanandintervalize(nrows) t = cputime; fid=fopen('163705359.csv'); i=1; aaplCount = 0; ibmCount = 0; dellCount = 0; textmatrix = textscan(fid,'%s%d%s%f32',nrows,'Delimiter',',','HeaderLines','1'); SymbolC = textmatrix{1}; YearV = textmatrix{2}; TimeC = textmatrix{3}; lengthofMatrix = length(SymbolC); while(i<=lengthofMatrix) symb = SymbolC(i); tm={ [] double(YearV(i)) char(TimeC(i)) }; currentDate = datevec(sprintf('%d %s',tm{2,1},tm{3,1}),'yyyymmdd HH:MM:SS'); currentHour = currentDate(4); currentMin = currentDate(5); floorDate = currentDate; floorDate(5) = currentDate(5) - mod(currentDate(5),5); floorDate(6) = 0; symbol = symb; while(isequal(symbol,symb) && isequal(currentHour,floorDate(4)) && isequal((currentMin - mod(currentMin,5)),floorDate(5))) i = i+1; if(i> lengthofMatrix) textmatrix = textscan(fid,'%s%d%s%f32',nrows,'Delimiter',',','HeaderLines','1'); SymbolC = textmatrix{1}; YearV = textmatrix{2}; TimeC = textmatrix{3}; lengthofMatrix = length(SymbolC); currentDate = datevec(sprintf('%d %s',tm{2,1},tm{3,1}),'yyyymmdd HH:MM:SS'); currentHour = currentDate(4); currentMin = currentDate(5); floorDate = currentDate; floorDate(5) = currentDate(5) - mod(currentDate(5),5); floorDate(6) = 0; i=1; end if(i > lengthofMatrix) break end symbol = SymbolC(i); tm={ [] double(YearV(i)) char(TimeC(i)) }; currentDate = datevec(sprintf('%d %s',tm{2,1},tm{3,1}),'yyyymmdd HH:MM:SS'); currentHour = currentDate(4); currentMin = currentDate(5); end if(strcmp(symbol,'AAPL')) aaplCount = aaplCount +1 else if(strcmp(symbol,'IBM')) ibmCount = ibmCount +1 else if(strcmp(symbol,'DELL')) dellCount = dellCount +1 end end end end fclose(fid); disp(aaplCount); disp(ibmCount); disp(dellCount); x=zeros(1); t = cputime - t end
From: Adam Thibideau on 22 Jul 2010 00:08 Please help me! This is taking forever! Thanks!
From: Walter Roberson on 22 Jul 2010 02:07 Adam Thibideau wrote: > I have the following code to count how many points in a time series i > will have when I average my input data. My problem is that my code > takes way too long. Any suggestions on how to speed things up would be > great because my input is about 60 million rows of csv data. Thanks in > advance! Take a subset of the input file, run the program on that subset, using the profiler to measure the performance. Look at the results of the profiler to determine what is taking the most time and concentrate on improving the performance of that. Hints: - parsing the year in as a decimal number and then sprintf()'ing that decimal back into a string again is a waste of time: you might as well leave it as a string - your code assumes that the date will never jump forward into the same 5 minute slot; if you are willing to make that assumption than nearly all of your date processing is a waste of time and you can make do with string comparisons of the hour text fields and marginally more sophisticated string comparisons of the minute text fields.
From: Jan Simon on 22 Jul 2010 12:45 Dear Adam, > tm={ > [] > double(YearV(i)) > char(TimeC(i)) > }; > > currentDate = datevec(sprintf('%d %s', tm{2,1}, tm{3,1}), 'yyyymmdd HH:MM:SS'); There is absolutely no need to create the cell "tm". This was just an example to demonstrate the usage of SPRINTF to create a valid date string: http://www.mathworks.com/matlabcentral/newsreader/view_thread/286603 The above lines can be simplified to: currentDate = datevec(sprintf('%d %s', YearV(i), TimeC{i}),'yyyymmdd HH:MM:SS'); But I showed you a faster method alreayd to get the date vector. Because DATEVEC wastes the most time in your program, it would be a good idea to use it: Year = floor(D / 1000); Month = rem(D / 100, 100); Day = rem(D, 100); Time = sscanf(T, '%d:%d:%d'); V = [Year, Month, Day, reshape(Time, 1, 3)]; Kind regards, Jan
From: Adam Thibideau on 22 Jul 2010 15:31 "Jan Simon" <matlab.THIS_YEAR(a)nMINUSsimon.de> wrote in message <i29sj2$gc8$1(a)fred.mathworks.com>... > Dear Adam, > > > > tm={ > > [] > > double(YearV(i)) > > char(TimeC(i)) > > }; > > > > currentDate = datevec(sprintf('%d %s', tm{2,1}, tm{3,1}), 'yyyymmdd HH:MM:SS'); > > There is absolutely no need to create the cell "tm". This was just an example to demonstrate the usage of SPRINTF to create a valid date string: > http://www.mathworks.com/matlabcentral/newsreader/view_thread/286603 > > The above lines can be simplified to: > currentDate = datevec(sprintf('%d %s', YearV(i), TimeC{i}),'yyyymmdd HH:MM:SS'); > > But I showed you a faster method alreayd to get the date vector. Because DATEVEC wastes the most time in your program, it would be a good idea to use it: > Year = floor(D / 1000); <<<----------------- WHAT IS D?? > Month = rem(D / 100, 100); > Day = rem(D, 100); > Time = sscanf(T, '%d:%d:%d'); <<<---------WHAT IS T??? > V = [Year, Month, Day, reshape(Time, 1, 3)]; > > Kind regards, Jan Jan, Thanks for this suggestions. I would like to use it but I am a little confused. What is D and T ??
|
Next
|
Last
Pages: 1 2 Prev: content based image retrieval Next: collect(Z,L) not collecting all terms |