From: Ayotunde on
"us " <us(a)neurol.unizh.ch> wrote in message <hsn8fh$p29$1(a)fred.mathworks.com>...
> "Ayotunde " <rhymer2k(a)yahoo.co.uk> wrote in message <hsn7a1$604$1(a)fred.mathworks.com>...
> > "us " <us(a)neurol.unizh.ch> wrote in message <hsk6lv$7lp$1(a)fred.mathworks.com>...
> > > "Ayotunde "
> > > > my question is that i am trying to replicate the results a paper (article) and they have used 5 minute observations however as is visible from the example of the csv, my data is more frequent than every 5 minutes. How do i go about extracting data from every 5 minutes from about 9.30 am when observations start, till 4.20 pm. each csv i have is for a whole year as mentioned above so i'd like to do this for the whole year.
> > > > thnx in advance
> > >
> > > a hint:
> > >
> > > help histc;
> > >
> > > us
> >

Thanks. I still don't know what i am doing and as i have exams i think i will give matlab a rest till i have more time. I'm panicking about my exams already and this just adds to my worries. Thanks again
> > Thank you for your hint, i still don't see how histc will help me though. i havn't come across an example that uses time for the "edges". Also i've tried but i cannot deduce how i can use histc to create a new array with data from every 5 minutes
>
> well then... a small example...
>
> one of the solutions
>
> % the data
> % - date strings...
> d0=datenum(now);
> ds=datestr(datenum(d0)+(0:10)/24); % <- D0 + 1hr * (0:10)
> % the engine
> t0=datenum(d0); % <- D0
> ts=datenum(t0+(0:10)/(.5*24)); % <- D0 + 2hr * (0:10) [there will be slack!]
> td=datenum(ds); % <- DS converted to DATENUMs
> [tx,tn]=histc(td,ts);
> % the result
> disp([tx.';tn.']);
> %{
> 2 2 2 2 2 0 0 0 0 0 0 % <- #obs of TD in TS
> 0 1 1 2 2 3 3 4 4 5 5 % <- index into TS
> %}
>
> us
From: dpb on
Ayotunde wrote:
....

> ... in the data i have prices seem to
> change once in while even though time hasnt changed. ...

I didn't notice that in the sampled data but that Which raises even more
questions about the legitimacy of what kind of analysis one might be
proposing on such a time series...

--
From: Ayotunde on
dpb <none(a)non.net> wrote in message <hspsct$b4f$1(a)news.eternal-september.org>...
> Ayotunde wrote:
> ...
>
> > ... in the data i have prices seem to
> > change once in while even though time hasnt changed. ...
>
> I didn't notice that in the sampled data but that Which raises even more
> questions about the legitimacy of what kind of analysis one might be
> proposing on such a time series...
>
> --

Basically in the paper i've tried to replicate they have used 5 minute observations on the S&P index (cash data) from april 1997 till oct 2002. analysis they have done includes daily returns, calculating various test statistics with the aim to check for volatilty in the data and look for "jumps". The paper doesn't mention where they get their data so i asked my lecturer for data only to be given almost 4 gb of csv files which i had no idea what to do with. He mentioned converting the tick-by-tick data he had given me using historical tick data but even that sounds very vague to me. I think i am just going to tell him what i really think about him on monday. cheers
From: Steven Lord on

"Ayotunde " <rhymer2k(a)yahoo.co.uk> wrote in message
news:hso228$qrm$1(a)fred.mathworks.com...
> ImageAnalyst <imageanalyst(a)mailinator.com> wrote in message
> <1f9cefc9-e938-4fad-9af1-9cc61730e170(a)d12g2000vbr.googlegroups.com>...
>> I'd probably first remove duplicated rows. Then I'd need to figure
>> out what fraction corresponds to 5 minutes. Then I'd probably use
>> interp1() to resample at exactly 5 minute intervals. Sound reasonable?
> i assume you where responding to my post and yes this does sound
> reasonable, but following what you say i can remove duplicated rows using
> unique(data,'rows') but then figuring out what fraction corresponds to 5
> minutes for me is not straightforward because daily observations start at
> 9.31 am and don't end till a random time around 4.20 pm. i guess i can use
> those times as 'edges' for histc but its all conjectures to me as all this
> is akin to putting me in the deep end of a pool when i cannot swim. I
> appreciate this is probably easy stuff so please go easy on me

What I'd probably do is determine how many minutes after midnight 9:31 AM
and 4:20 PM are, then use COLON and step by 5's.

numMinSinceMidnight = @(h, m) 60*h+m;
observationTimes = numMinSinceMidnight(9, 31):5:numMinSinceMidnight(16, 20)

Note that since 4:20 PM is not exactly a multiple of 5 minutes after 9:31
AM, you may need to adjust the final entry (or add an extra entry to the
end) of observationTimes.

If you now need serial date numbers or the date strings (I'm assuming these
are for today:)

startOfDay = datenum([2010 05 17 0 0 0]);
observationSDN = zeros(size(observationTimes));
for k = 1:numel(observationTimes)
observationSDN(k) = addtodate(startOfDay, observationTimes(k),
'minute');
end
observationDatestrings = datestr(observationSDN)

--
Steve Lord
slord(a)mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ


From: dpb on
Ayotunde wrote:
> dpb <none(a)non.net> wrote in message
> <hspsct$b4f$1(a)news.eternal-september.org>...
>> Ayotunde wrote:
>> ...
>>
>> > ... in the data i have prices seem to > change once in while even
>> though time hasnt changed. ...
>>
>> I didn't notice that in the sampled data but that Which raises even
>> more questions about the legitimacy of what kind of analysis one might
>> be proposing on such a time series...
>>
>> --
>
> Basically in the paper i've tried to replicate they have used 5 minute
> observations on the S&P index (cash data) from april 1997 till oct 2002.
> analysis they have done includes daily returns, calculating various test
> statistics with the aim to check for volatilty in the data and look for
> "jumps". The paper doesn't mention where they get their data so i asked
> my lecturer for data only to be given almost 4 gb of csv files which i
> had no idea what to do with. He mentioned converting the tick-by-tick
> data he had given me using historical tick data but even that sounds
> very vague to me. I think i am just going to tell him what i really
> think about him on monday. cheers

Chuckle...let us know how that goes over, ok? :)

I've seen lots of analyses on stock data that essentially applies
continuous-variable models to discontinuous data -- the idea of
interpolating to fill in missing 5-minute is one as well as the multiple
samples at same time is another. I don't know enough about the
reporting of the index data to know what it means, but it seems worth
wondering about what it is that's being attempted.

--