From: Pradipto on
Hi,

There are two solutions for simple versions of the rolling regression problem available on Matlab Central.

Suppose n = rolling window size. Suppose the problem is solve a rolling regression y=a+bx.

1. Now if x is a constant vector in each rolling window, i.e. let size(x,1)=n but size(y,1)>n, then the algorithm described here:

http://www.mathworks.com/matlabcentral/newsreader/view_thread/49181

using the filter function works pretty well because it takes advantage of the fact the x (in the example on that webpage x was -20:-1:0) does not change in any rolling window.

2. Suppose however x is not a constant vector, but instead let size(x,1) = size(y,1) > n. Further, during each rolling window we want to regress y(j-n+1:j) against x(j-n+1:j). Then the algorithm described here:

http://www.mathworks.com/matlabcentral/newsreader/view_thread/249905

using the cumsum function provides a very neat solution. However, again this takes advantage of the fact that in the case when x is only 1 column, i.e size(x,2)=1, we know the "close form analytic solution", i.e. beta (or the slope) = cov(x,y)/var(x), etc.

3. However consider the more complicated case when the number of columns of x say nc=size(x,2)>1 (and we have size(x,1)=size(y,1) etc) and i want to write a code where nc is flexible.

Here are three alternate versions that I have

-------- ver 1: totally non-vectorized ------------------

function [a,b]=rollregress(x,y,m)
% a = intersect (the beta w.r.t to a dummy column of 1)
% b = slopes (the beta for each column of x)
[nr,nc]=size(x);
x=[ones(nr,1) x];
nc=nc+1;
b=NaN(nr-m+1,nc);
for k=m:nr
b(k-m+1,:)=regress(y(k-m+1:k),x(k-m+1:k,:));
end
a=b(:,1); % intersect
b=b(:,2:end); % slopes

------------------------------------------------------------------
ver1: time taken for input size of nr = 5114, nc = 2 is 0.551 seconds (averaged over 100 trials).

-------- ver 2: partially vectorized ----------------------

function [a,b]=rollregress(x,y,m)
% a = intersect (the beta w.r.t to a dummy column of 1)
% b = slopes (the beta for each column of x)

[nr,nc]=size(x);
x=[ones(nr,1) x];
nc=nc+1;
p=m+1:nr+1;q=1:nr-m+1;

% get the rolling sum of y'*x
sxy=[zeros(1,nc); cumsum(x.*y(:,ones(nc,1)),1)]; sxy=sxy(p,:)-sxy(q,:);
% get the rolling sum of x'*x
% --- also this loop is not bad, because typically nc is small
sx2=NaN(nc,nr-m+1,nc);
parfor j=1:nc
sx2r=[zeros(1,nc); cumsum(x.*x(:,j*ones(nc,1)),1)];
sx2r=sx2r(p,:)-sx2r(q,:);
sx2(j,:,:)=sx2r;
end
% --- this is the part that needs to be vectorized
b=NaN(nr-m+1,nc);
parfor k=1:nr-m+1
cxy=sxy(k,:);
cx2=squeeze(sx2(:,k,:));
b(k,:)=cxy/cx2;
end
a=b(:,1); % intersect
b=b(:,2:end); % slopes

------------------------------------------------------------------
ver2: time taken for input size of nr = 5114, nc = 2 is 0.456 seconds (averaged over 100 trials) with parfor loops (1 pool open) and 0.409 seconds (averaged over 100 trials) with for loops instead of the parfor loops. If I open 4 pools (I have a 4-core PC) using matlabpool('open',4), then the corresponding speed is 0.300 seconds.

-------- ver 3: absolutely dumb version & also non-vectorized --

function [a,b]=rollregress(x,y,m)
% a = intersect (the beta w.r.t to a dummy column of 1)
% b = slopes (the beta for each column of x)
[nr,nc]=size(x);
x=[ones(nr,1) x];
nc=nc+1;
b=NaN(nr-m+1,nc);
for k=1:nr-m+1
yt=y(k:k+m-1);xt=x(k:k+m-1,:);
b(k,:)=(yt'*xt)/(xt'*xt);
end
a=b(:,1); % intersect
b=b(:,2:end); % slopes

------------------------------------------------------------------

ver3: time taken for input size of nr = 5114, nc = 2 is 0.219 seconds (averaged over 100 trials).
------------------------------------------------------------------

Ver 1 & 3 are also unaffected by the number of pools opened using matlabpool.

In summary, the absolute dumbest non-vectorized version is the fastest version.

Can anyone provide a faster version OR may be a fully vectorized version?

Thanks,
P.