From: dpb on 16 May 2010 10:47 Ayotunde wrote: > Basically i've got data consisting of observations on the S&P index for > 6 years which for example for year 1999, the csv looks like > SYMBOL,DATE,TIME,PRICE,SIZE,G127,CORR,COND,EX >> SPY,19980102,09:31:41,97.3125,53500,0,0,,A >> SPY,19980102,09:31:43,97.3125,100,0,0,,M >> SPY,19980102,09:31:43,97.3125,200,0,0,,M >> SPY,19980102,09:31:44,97.3125,4800,0,0,,P >> SPY,19980102,09:31:44,97.3125,100,0,0,,M .... >> SPY,19980102,09:31:44,97.3125,200,0,0,,M >> SPY,19980102,09:31:44,97.3125,1000,0,0,,M >> SPY,19980102,09:31:45,97.3125,900,0,0,,M >> SPY,19980102,09:31:45,97.3125,300,0,0,,M >> SPY,19980102,09:31:45,97.3125,200,0,0,,M >> SPY,19980102,09:31:45,97.3125,300,0,0,,M >> SPY,19980102,09:31:52,97.3125,200,0,0,,T >> SPY,19980102,09:32:03,97.4375,1000,0,0,,A >> SPY,19980102,09:32:11,97.4375,1000,0,0,,A >> SPY,19980102,09:32:20,97.3125,100,0,0,Z,P >> SPY,19980102,09:32:21,97.3125,200,0,0,,T .... > my question is that i am trying to replicate the results a paper > (article) and they have used 5 minute observations however as is visible > from the example of the csv, my data is more frequent than every 5 > minutes. ... Well, I don't know what the original paper's data stream looked like to start with, either, but you don't have data in the sample dataset at 5-minute intervals. You have both replications at the same time stamp and then intervals of (apparently) asynchronous time stamps at varying intervals from 1 to as much as 9 minutes between samples in just this short set. So, there are some 5-minute intervals for which you have no data at all. Methinks you need to look at the data source more carefully or reevaluate the analysis you're comparing to before worrying to much about the details of selection. --
From: ImageAnalyst on 16 May 2010 11:10 And finding the increment for 5 minutes is pretty easy empirically: for k = 729757.397001 : .00001 : 729757.400800 deltaTime = k - 729757.397001; fprintf(1, 'k=%f, time=%s, delta time = %f\n',... k, datestr(k), deltaTime) end timeIncrement = 0.00348; resampledTimes = 729757.397002315 : timeIncrement : 729758.39780; for k = 1 : length(resampledTimes) fprintf(1, 'k=%f, time=%s\n', k, datestr(resampledTimes(k))) end Looks to be about 0.00348 corresponds to about a 5 minute difference. So can't you just pass in your first column (times) as your x, your second column (the prices, with duplicates deleted) as y, and resampledTimes as xi into interp1 to get your new interpolated values? They'll be the interpolated values taken every 5 minutes.
From: us on 16 May 2010 11:25 ImageAnalyst <imageanalyst(a)mailinator.com> wrote in message <f0aa7e5f-3e85-45d7-988a-8f23cab72ffa(a)p17g2000vbe.googlegroups.com>... > And finding the increment for 5 minutes is pretty easy empirically: > for k = 729757.397001 : .00001 : 729757.400800 > deltaTime = k - 729757.397001; > fprintf(1, 'k=%f, time=%s, delta time = %f\n',... > k, datestr(k), deltaTime) > end > > timeIncrement = 0.00348; > resampledTimes = 729757.397002315 : timeIncrement : 729758.39780; > > for k = 1 : length(resampledTimes) > fprintf(1, 'k=%f, time=%s\n', k, datestr(resampledTimes(k))) > end > > Looks to be about 0.00348 corresponds to about a 5 minute difference. > So can't you just pass in your first column (times) as your x, your > second column (the prices, with duplicates deleted) as y, and > resampledTimes as xi into interp1 to get your new interpolated > values? They'll be the interpolated values taken every 5 minutes. a minute in ML time space t0=now; ts=t0+(0:9)/(24*60); % <- 1/hr/min datestr(ts) %{ 16-May-2010 17:24:58 16-May-2010 17:25:58 16-May-2010 17:26:58 16-May-2010 17:27:58 16-May-2010 17:28:58 16-May-2010 17:29:58 16-May-2010 17:30:58 16-May-2010 17:31:58 16-May-2010 17:32:58 16-May-2010 17:33:58 %} us
From: Ayotunde on 16 May 2010 17:44 ImageAnalyst <imageanalyst(a)mailinator.com> wrote in message <f0aa7e5f-3e85-45d7-988a-8f23cab72ffa(a)p17g2000vbe.googlegroups.com>... > And finding the increment for 5 minutes is pretty easy empirically: > for k = 729757.397001 : .00001 : 729757.400800 > deltaTime = k - 729757.397001; > fprintf(1, 'k=%f, time=%s, delta time = %f\n',... > k, datestr(k), deltaTime) > end > > timeIncrement = 0.00348; > resampledTimes = 729757.397002315 : timeIncrement : 729758.39780; > > for k = 1 : length(resampledTimes) > fprintf(1, 'k=%f, time=%s\n', k, datestr(resampledTimes(k))) > end > > Looks to be about 0.00348 corresponds to about a 5 minute difference. > So can't you just pass in your first column (times) as your x, your > second column (the prices, with duplicates deleted) as y, and > resampledTimes as xi into interp1 to get your new interpolated > values? They'll be the interpolated values taken every 5 minutes. hey ImageAnalyst, i seem to have more success with your method than with us's histc method but is there a way of limiting the resampled times (e.g 9.30-16.20). Also i get an error when trying to use interpl because even though i have deleted the duplicated rows, there ares till some duplicates in the time column as in the data i have prices seem to change once in while even though time hasnt changed. This means that interpl gives me this error prices=interp1(q1,q2,resampledtimes); ??? Error using ==> interp1 at 261 The values of X should be distinct.
From: ImageAnalyst on 16 May 2010 18:00 Yes you can determine the starting and ending times. Just figure out what crazy long floating point numbers correspond to the times you want to limit it to, and just pass in that subarray. Extract that out to a brand new variable if you want. Then pass it in to interp1(). If you have two different prices for the same time then you need to get rid of one of them. You could just go through with a for loop although there probably is a more clever vectorized way to do it using diff() on the times column (and then looking for zeros), such as this little example: a=[1 2;... 3 4;... 3 6; ... 4 5] d = diff(a(:,1)) == 0 a(d,:) = []
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: Sorting Cell Array Next: help me with this in simulink: 1000 consecutive zero crossings |