From: Luna Moon on
On Jul 21, 6:36 pm, dpb <n...(a)non.net> wrote:
> Luna Moon wrote:
>
> ...
>
> > The key part is how to do back-fill fast!
>
> How big of a series is this and what's the typical sparseness?
>
> Wouldn't seem it should be particularly time consuming but an example
> might help visualize.
>
> --

Very large, millions of rows... and I have lots of such time series...

So let's just focus on how to write such backfill function better...
From: dpb on
Luna Moon wrote:
> On Jul 21, 6:36 pm, dpb <n...(a)non.net> wrote:
>> Luna Moon wrote:
>>
>> ...
>>
>>> The key part is how to do back-fill fast!
>> How big of a series is this and what's the typical sparseness?
>>
>> Wouldn't seem it should be particularly time consuming but an example
>> might help visualize.
>>
>> --
>
> Very large, millions of rows... and I have lots of such time series...
>
> So let's just focus on how to write such backfill function better...

How about revising the algorithm?

Or, perhaps mex the function you have.

I'll consider it overnight; nothing pops to mind automagic...

--
From: Luna Moon on
Post now also to comp.dsp to see if experts can help us.

The bottleneck is the "backfill" part.

There must be a "filter" way of doing "backfill" fast?

Thanks a lot!

On Jul 21, 3:53 pm, Luna Moon <lunamoonm...(a)gmail.com> wrote:
> How to align two time series fast?
>
> Hi all,
>
> I have two time series, both are in the following format:
>
> Date       Data
> 1/1/2010    5.3
> 1/2/2010    4.4
> ...
>
> Lets label the first time series: MyDates1, MyData1 and the second
> time series: MyDates2, MyData2,
>
> where MyDates1 and MyData1 have the same number of rows and MyDates2
> and MyData2 have the same number of rows,
>
> and where MyDates1 and MyDates2 are in fact in datenum format.
>
> The sets MyDates1 and MyDates2 are very different.
>
> How can I align the time series two to be in line with the time series
> one?
>
> That's to say, we want to modify MyDates2 and MyData2 to make them in
> line with MyDates1 and MyData1.
>
> Actions:
>
> (1) If a date is in MyDates1 but not in MyDates2, then insert that
> date into MyDates2 and put an "NaN" into corresponding location in
> MyData2.
>
> (2) If a date is in MyDates2 but not in MyDates1, then delete that
> date from MyDates2 and delete the data in the corresponding location
> in MyData2.
>
> (3) The 2nd time series now may look like the following:
>
> Date         Data
> 1/1/2010     NaN
> 1/2/2010     NaN
> 1/5/2010     2.3
> 1/6/2010     NaN
> 1/7/2010     NaN
> 1/8/2010     3.1
> ...
>
> Then we need to backfill the holes ("NaN"s) in this 2nd time series.
>
> For example, the above data, after backfill, become:
>
> Date         Data
> 1/1/2010     NaN
> 1/2/2010     NaN
> 1/5/2010     2.3
> 1/6/2010     2.3
> 1/7/2010     2.3
> 1/8/2010     3.1
> ...
>
> Note that the first a few missing values("NaN"s) cannot be
> backfilled...
>
> The output is the modified MyData2, because the modified MyDate2
> should be exactly as the MyDate1 which is used as reference.
>
> MyData2 should now have the same number of rows as MyDate1, MyData1,
> and MyDate2(modified).
>
> I currently do this using Matlab Financial toolbox,
>
> but it's very slow,
>
> Any thought how I can do it fast?
>
> Thanks a lot!

From: Steve Amphlett on
Luna Moon <lunamoonmoon(a)gmail.com> wrote in message <be8c1193-d5a2-445f-8c88-02248117568b(a)e5g2000yqn.googlegroups.com>...
>
> There must be a "filter" way of doing "backfill" fast?

A MEX would be trivial. Here is a more traditional ML approach. It doesn't do the ends properly though, this is left as an exercise for the OP.

x=[1;1;2;1;2;3;NaN;NaN;3;2;NaN;NaN;NaN;4;1;5;2;NaN;3];

y=zeros(size(x));
idx=isnan(x);

idx1=find(diff(idx)>0);
idx2=find(diff(idx)<0);

y(idx1+1)=x(idx1);
y(idx2+1)=-x(idx1);
y=cumsum(y);
y(~idx)=x(~idx);

[x z]
From: dpb on
dpb wrote:
> Luna Moon wrote:
>> On Jul 21, 6:36 pm, dpb <n...(a)non.net> wrote:
>>> Luna Moon wrote:
>>>
>>> ...
>>>
>>>> The key part is how to do back-fill fast!
>>> How big of a series is this and what's the typical sparseness?
>>>
>>> Wouldn't seem it should be particularly time consuming but an example
>>> might help visualize.
>>>
>>> --
>>
>> Very large, millions of rows... and I have lots of such time series...
>>
>> So let's just focus on how to write such backfill function better...
>
> How about revising the algorithm?
>
> Or, perhaps mex the function you have.
>
> I'll consider it overnight; nothing pops to mind automagic...

OK, what about

idx = ~isnan(d) & isnan([d(2:end) -1])

is logical array of those locations w/ a value followed by Nan

The next location in the array is to be replaced with the value at this
location.

Iterate this until idx==0

Still iterative but perhaps different than you're currently doing...

--