From: chris on
In summary, the problem is forecasting timeseries using PCA.

I have four key matrices (R, X, Z, Q). I shall describe them in detail here:

R: Raw time series data

This is a matrix of timeseries data size (n x t). n is the number of columns and t is the time stamps. n=3,500 and t = 1M

The timeseries are all very similar, with a high correlation (40-80%), and similar moments (mean, var, kur, ske). The data is near stationary. The data is very, very noisy. The data is all historical data. The data is all of the same size, and roughly speaking Gaussian distribution. In the below example, I give R to be the difference in the “crime rate”: IE crime level on day T minus crime level on day T-1.


X: Matrix of signals

This is a matrix of signals that forecasts the timeseries data in matrix R. The size of matrix X is (n x t x m). n and t are as above. m=25, and this is because each raw data series column has m number of signals that can be used to forecast its future value. The signals in X are zero mean, univariate, capped at plus-minus one. On the whole the signals (m = 1,2,3,...25) have a low correlation to each other. The signals have distributions that on the whole are uniform (though may tend to being Gaussian).

We know that these signals are good forecasters of the timeseries in R, because on the whole, each one of them has a positive expectation:

e1 = E[X(n, t-1, m) x R(n, t)] // ie lagged signal forecasts unlagged data point.
e1 > 0

Each of the m signals relates to some "real-life" variable.

For example if n are 3,500 different towns in the UK, and X is the crime rate over time in each town, an example of m (the signals/ predictors) would be poverty, drugs use, education levels etc. We note whilst it is hard to successfully predict crime rate in one town given the limited data, once we collect data from 3,500 towns, the true patterns should be able to emerge.

The problem is that while all m signal have e1>0, they don’t always work. They have periods in time where they do work, and where they don’t work. All e1>0 is saying is that on average they do work.

I know that I can improve on this result and the forecast accuracy, by using PCA. In summary I believe this can be done by weighting the original signals as to whether you believe them or not, and also to what extent (eg “I always believe my poverty component is a predictor of crime”)


Q: Matrix of PCA component timeseries

I know that if I decompose R into a matrix of PCA component timeseries, then the sources of variance that are generated will very likely have the same real-life meanings the same as my signals in X (ie poverty, education etc).

First question: How does one decompose R into Q, using PCA, to generate timeseries of the components. I am unclear if this would be done for each n variable, or if you would just come up with some average set of component timeseries.

When I decompose one column of R (ie R(n=1, t)) I see that the eigenvalues steeply fall away: suggesting that 1PC describes ~80% of the variance. The first four PC describe ~ 95% of the variance.

This will give results, when if thought of graphically will look like timeseries random variables, were they all sum to 100% at any point in time. Another point that I'm not clear on, is that it seems impossible to assign one timeseries the name 1PC (1st component), as that component is only 1PC when it's at its max value?

It is at this stage I want to assign component timeseries a corresponding signal timeseries. (eg at point T=t, 1PC is signal number 4. I am guessing the right way to do this is by regression?)

hence I am now at the stage whereby I have m timeseries (for each col of data) that vary over time, always sum to 100%, and represent how much of the data's variance each signal can explain over time (ie how good is each signal over time).

I am now in a position where I can use this matrix Q to modulate my matrix X.


Z: Matrix of modulated signals

By applying to matrix Q to X, I generate Z. I apply Q by normalizing Q first, so that it is zero mean over time, and the components sum to 1 and that they can never be bigger than +/-1. If for example one component equals plus one, then all the forecasting power would be in the relevant signal. We are saying that at that point in time, all the data's variance can be explained using one signal. All the other signals are tuned off.

We may find that one signal is constantly always on and is positive. For the above example, poverty might always be related to crime (common sense).

We may also find that sometimes a signal is given a negative weight (ie the signal is essentially inversed). For example, normally drugs are related to crime, but around new years eve lots of "non-criminal types" take hashish for fun, and actually less crime is committed as a result. Hence at this point in time, drugs use does not forecast crime, but forecasts a decrease in crime.

We are able to prove this approach works when we look at

e2 = E[Z(n, t-1, m) x R(n, t)]
e2 > e1

sorry about the length of this post. I have covered a lot of points, probably too much for one reply. I just wanted to state the entire problem in one go

In summary, I cannot believe I am the first person to think of this: it seems (unless I mistaken) a pretty vanilla sounding approach.

From: chris on
Let me summarize that rather long post: A first, simple question is: Has anyone applied PCA to a timeseries, to work out how the components vary over time. ie if you are only interested in the first 25 components, resulting in 25 timeseries. Each ts being a component varying over time.

It would be nice to able to say 1PC equals poverty, say (as in the example used), and observe how much of the variance of crime rate poverty explains over time. However, is that correct? What if 1PC on day 1 was poverty, and then on day 2 1PC was drugs. Hence just generating 1PC timeseries would mean, that it was meaningless to try and associate it with a "real life explinantion".

Anyone who has thought of any problems similiar to this before, I would be very keen to hear from you!

thanks
From: chris on
ok. No response so far:(

A simplier question is: has anyone applied PCA to any timeseries problems at all? keen to hear your experinces if you think it's a useful tool
From: Guido on
"chris " <hchris01(a)students.NOSPAM.bbk.ac.uk> wrote in message <hljtmc$eo2$1(a)fred.mathworks.com>...
> ok. No response so far:(
>
> A simplier question is: has anyone applied PCA to any timeseries problems at all? keen to hear your experinces if you think it's a useful tool
Dear Chris,
One possible way to deal with time-changing PCA shares in a time series context would be to perform PCACOV with level breaks (ref.: Perron & Bai):

T=100;N=3;x=rand(T,N); % Let x be rand series of length T, size N
bcmat=tril(ones(T,T),-1); % TxT matrix of level breaks
% Perform PCA on cov of x and bcmat for each T
for i=1:T
X=[x,bcmat(:,i)];
[COEFF, LATENT, EXPLAINED(i,:)] =pcacov(X'*X);
end
%
% First column of EXPLAINED is changing share of x1, second of x2, etc.
% Horizontal sum of EXPLAINED, for each T, is 100%.
% NOTICE: if you use bcmat in regression make endpoints trimming!
Hope it helps and let me know, Guido
From: chris on
Thanks for your reply.

You state: "One possible way to deal with time-changing PCA shares in a time series context would be to perform PCACOV with level breaks (ref.: Perron & Bai)".

had a quick look on Google, and these guys (Perron & Bai) have done lots on this type of stuff. Was there a particular paper you were recommending, and if so, do you have the URL?

Thanks