Prev: Remove small objects from binary image without erode.
Next: remove duplicate rows in sparse matrix
From: Adam Thibideau on 10 Jul 2010 14:48 Hi, I am working on some code to process very large amounts of data (csv files with 10 million + rows) and I need some help optimizing a section. Any chance I can get a suggestion to improve the following date conversion? What I have: A date represented as an int32 (eg. 20071231 = 31-Dec-2007) and a time represnted as a string (eg. 9:35:07). What I need: A format that will enable me to do arithmetic with the dates and times together (find intervals between them). What I have done: I have found a solution but it takes way too long. I have determined that the datevec format is suitable for the calculations I need to make and convert with the following code: x(j).Date = datevec(strcat(datestr(datenum(num2str(textmatrix{2}(i)),'yyyymmdd')),{' '},textmatrix{3}(i))); which takes the int32 stored in 'textmatrix{2}(i)' and the string stored in 'textmatrix{3}(i)', converts them to the appropriate format and then adds them to my struct 'x' in the 'Date' field. This needs to be done 20 to 30 times faster in order to be useful to me, which I believe is possible. When i use tic toc for one iteration of this it says Elapsed time is 0.006437 seconds. I downloaded the files for DateStr2Num but my time is not always in the correct format (it requires HHMMSS and if the hour is a single digit it fails). Any help is greatly appreciated!
From: us on 10 Jul 2010 15:08 "Adam Thibideau" <adam.thibideau(a)gmail.com> wrote in message <i1af95$2jg$1(a)fred.mathworks.com>... > Hi, > > I am working on some code to process very large amounts of data (csv files with 10 million + rows) and I need some help optimizing a section. Any chance I can get a suggestion to improve the following date conversion? > > What I have: A date represented as an int32 (eg. 20071231 = 31-Dec-2007) and a time represnted as a string (eg. 9:35:07). > What I need: A format that will enable me to do arithmetic with the dates and times together (find intervals between them). > What I have done: I have found a solution but it takes way too long. I have determined that the datevec format is suitable for the calculations I need to make and convert with the following code: > > x(j).Date = datevec(strcat(datestr(datenum(num2str(textmatrix{2}(i)),'yyyymmdd')),{' '},textmatrix{3}(i))); > > which takes the int32 stored in 'textmatrix{2}(i)' and the string stored in 'textmatrix{3}(i)', converts them to the appropriate format and then adds them to my struct 'x' in the 'Date' field. > > This needs to be done 20 to 30 times faster in order to be useful to me, which I believe is possible. When i use tic toc for one iteration of this it says Elapsed time is 0.006437 seconds. I downloaded the files for DateStr2Num but my time is not always in the correct format (it requires HHMMSS and if the hour is a single digit it fails). > > Any help is greatly appreciated! one of the many solutions tm={ [] 20071231 '9:35:07' }; dv=datevec(sprintf('%d %s',tm{2,1},tm{3,1}),'yyyymmdd HH:MM:SS'); disp(dv); % 2007 12 31 9 35 7 us
From: Jan Simon on 10 Jul 2010 16:01 Dear Adam, > What I have: A date represented as an int32 (eg. 20071231 = 31-Dec-2007) and a time represnted as a string (eg. 9:35:07). > What I need: A format that will enable me to do arithmetic with the dates and times together (find intervals between them). > x(j).Date = datevec(strcat(datestr(datenum(num2str(textmatrix{2}(i)),'yyyymmdd')),{' '},textmatrix{3}(i))); > which takes the int32 stored in 'textmatrix{2}(i)' and the string stored in 'textmatrix{3}(i)', converts them to the appropriate format and then adds them to my struct 'x' in the 'Date' field. It is not clear to me what "textmatrix{3}(i)" means. As far as I can see this cannot be a string, but just a single CHAR?! > I downloaded the files for DateStr2Num but my time is not always in the correct format (it requires HHMMSS and if the hour is a single digit it fails). Then it would be helpful to adjust the code of DateConvert to reduce the time consuming calls to DATENUM an DATESTR. If you want DATEVEC as output format, this could help (I start with "D" and "T" as input, because I'm not sure about your "textmatrix"): D = uint32(20071231); T = '9:35:07' D = double(D); Year = floor(D / 1000); Month = rem(D / 100, 100); Day = rem(D, 100); Time = sscanf(T, '%d:%d:%d'); V = [Year, Month, Day, reshape(Time, 1, 3)]; If you want to convert this to a serial date number, do *not* use DATENUM, but DATENUMMX. Good luck, Jan
From: Adam Thibideau on 10 Jul 2010 17:12 "us " <us(a)neurol.unizh.ch> wrote in message <i1agek$c6m$1(a)fred.mathworks.com>... > "Adam Thibideau" <adam.thibideau(a)gmail.com> wrote in message <i1af95$2jg$1(a)fred.mathworks.com>... > > Hi, > > > > I am working on some code to process very large amounts of data (csv files with 10 million + rows) and I need some help optimizing a section. Any chance I can get a suggestion to improve the following date conversion? > > > > What I have: A date represented as an int32 (eg. 20071231 = 31-Dec-2007) and a time represnted as a string (eg. 9:35:07). > > What I need: A format that will enable me to do arithmetic with the dates and times together (find intervals between them). > > What I have done: I have found a solution but it takes way too long. I have determined that the datevec format is suitable for the calculations I need to make and convert with the following code: > > > > x(j).Date = datevec(strcat(datestr(datenum(num2str(textmatrix{2}(i)),'yyyymmdd')),{' '},textmatrix{3}(i))); > > > > which takes the int32 stored in 'textmatrix{2}(i)' and the string stored in 'textmatrix{3}(i)', converts them to the appropriate format and then adds them to my struct 'x' in the 'Date' field. > > > > This needs to be done 20 to 30 times faster in order to be useful to me, which I believe is possible. When i use tic toc for one iteration of this it says Elapsed time is 0.006437 seconds. I downloaded the files for DateStr2Num but my time is not always in the correct format (it requires HHMMSS and if the hour is a single digit it fails). > > > > Any help is greatly appreciated! > > one of the many solutions > > tm={ > [] > 20071231 > '9:35:07' > }; > dv=datevec(sprintf('%d %s',tm{2,1},tm{3,1}),'yyyymmdd HH:MM:SS'); > disp(dv); > % 2007 12 31 9 35 7 > > us Wow thanks for you quick reply! That worked very nicely, it is much faster now. I am still having a problem that i cannot seem to figure out though...I have the function output the current index that is being accessed while it is running so i can monitor the progress and it is quite fast at first (approx 1000/sec) but as it accesses the higher indicies it becomes very very slow (almost 5 sec for 1000 by the time its at index 100,000) and continues to get slower as this increases. Any ideas on that? Is this expected behaviour? I created an output struct using the zeros() function so it shouldnt be copying the whole thing every time. Here is my code: fid=fopen('173707820.csv'); x = struct('Symbol',zeros(1,count),'Date',zeros(1,count),'Price',zeros(1,count)); j = 1; %INITIALIZE SECOND INDEX REFERENCE TO 1 (THIS KEEPS TRACK OF THE LOCATION IN OUTPUT STRUCTURE) k=0; while(k<30) k=k+1; i=1; textmatrix = textscan(fid,'%s%d%s%f32',count,'Delimiter',',','HeaderLines','1'); symb = textmatrix{1}(i); %INITIALIZE 'symb' VARIABLE TO iTH SYMBOL while(strcmp(symb,symbol)==0) %FINDS THE FIRST INSTANCE OF THE VARIABLE 'symbol' IN THE DATA. i = i+1; %DATA MUST BE SORTED SO ALL DATA FOR EACH SYMBOL IS GROUPED TOGETHER. if(i>= length(textmatrix{1})) break end symb = textmatrix{1}(i); %SET 'symb' VARIABLE TO iTH SYMBOL end if(i>= length(textmatrix{1})) break end for(q=1:count) %LOOP THROUGH DATA WHILE 'symb' == 'symbol' x(j).Symbol = symb; tm={ [] double(textmatrix{2}(i)) char(textmatrix{3}(i)) }; x(j).Date = datevec(sprintf('%d %s',tm{2,1},tm{3,1}),'yyyymmdd HH:MM:SS'); %x(j).Date = datevec(strcat(datestr(datenum(num2str(textmatrix{2}(i)),'yyyymmdd')),{' '},textmatrix{3}(i))); %PARSE THE DATE AND TIME AND PUT IT IN A USEFUL FORMAT FOR ARITHMETIC x(j).Price = textmatrix{4}(i); %ENTER THE PRICE i = i+1 %ADVANCE THE FIRST INDEX REFERENCE %if(i >= length(textmatrix{1})) %IF THE 1ST INDEX REFERENCE IS OUTSIDE THE DATA, EXIT THE WHILE LOOP % j=j+1; %break %end j = j+1; %ADVANCE THE 2ND INDEX REFERENCE end end
From: us on 10 Jul 2010 17:28 "Adam Thibideau" > Wow thanks for you quick reply! That worked very nicely, it is much faster now. I am still having a problem that i cannot seem to figure out though...I have the function output the current index that is being accessed while it is running so i can monitor the progress and it is quite fast at first (approx 1000/sec) but as it accesses the higher indicies it becomes very very slow (almost 5 sec for 1000 by the time its at index 100,000) and continues to get slower as this increases. Any ideas on that? Is this expected behaviour? I created an output struct using the zeros() function so it shouldnt be copying the whole thing every time. > > Here is my code: > x = struct('Symbol',zeros(1,count),'Date',zeros(1,count),'Price',zeros(1,count)); > x(j).Date = datevec(sprintf('%d %s',tm{2,1},tm{3,1}),'yyyymmdd HH:MM:SS'); a hint: - you do NOT pre-allocate X properly... count=10; % your way (abbreviated)... x=struct('date',zeros(1,count)) %{ % x = date: [0 0 0 0 0 0 0 0 0 0] % <- you do NOT need this, it's assigned at rt %} % one of the CSSMers ways x=struct('date',repmat({[]},1,count)) %{ % x = 1x10 struct array with fields: date %} us
|
Next
|
Last
Pages: 1 2 Prev: Remove small objects from binary image without erode. Next: remove duplicate rows in sparse matrix |