PCA [Matlab]

Prev: A bug in avifile?
Next: Parallel port

From: Greg Heath on 24 Sep 2006 17:23

Gian Piero Bandieramonte wrote:
> I have tried processing my network inputs and targets using
> prestd,prepca,trastd,trapca and poststd. By doing this, my network
> generalization performance hugely decreases to dangerous levels.

It shouldn't.

> I
> have used them the way the matlab help tells me to use them. So I had
> to stick up with using princomp to process my data so as to reduce
> its dimensionality (the net generalization using princomp is aprox
> 1000 times better than using prepca,trapca....).

You are doing something wrong. It should be exactly the same.

> My design set, consisting of a training set of 252 rows has been
> processed by princomp to reduce the dimensionality of the input from
> 37 to 22 (I'm being conservative by now). Then I simulate my net with
> the same training set having excellent fit. Then I simulate with a
> new test set, of 5000 rows, having somewhat a good generalization. So
> obviously I preprocessed this test set with princomp before
> simulating.

No you should not. Use the transformation matrix of the training set.

> But there is a problem here: I'm not supposed to
> preprocess my test set using all the batch at one time, I'm supposed
> to preprocess individually each row of the test set.

If you use the transformation matrix it should make no difference.

> But I don't know
> how to do this with princomp,

See below.

> if I use this func with only one row, a
> matlab error appears. If I use two rows,it creates a new 2x37 matrix
> with zero's from column 2 to 37, but if I processed using the whole
> batch, the new transformed matrix wouldn't have zero's at
> M(1:2,2:37). The output changes depending on the number of rows. So
> I need a consistent way of processing each row individually, maybe
> using some transformation matrix or vector, but I don't know how to
> find it, and it is not available the way you said on :
>
> > > >I use eigs(corcoeff(X)) for PCA (instead of princomps), so
> > > > I don't know if the transformation matrix is available to
> you
> > > > without solving T*X = PC using T = X/PC.
>
> I tried calculating T this way but it doesn't properly transform my
> rows using it any way. I'm also a bit confused about this. In the
> case of using prepca and trapca I do have a transformation matrix,
> because it is one of the outputs of prepca. This sort of
> transformation matrix is what I need. But since I'm not using prepca,
> but princomp, what can I do?

Solve T*X = PC for T.

Hope this helps.

Greg

From: Gian Piero Bandieramonte on 26 Sep 2006 15:37

> You are doing something wrong. It should be exactly the same.

You mean the generalization properties should be exactly the same;
the outputs returned by the network should be exactly the same on
both cases??
I don't see why should it be the same, try applying PCA on an
ill-conditioned matrix (condition >>100) using princomp and
prestd-prepca, the transformed matrixes in each cases have got
different values, so the training of the network wouldn't be the
same, meaning the the outputs simulated wouldn't also be the same. So
I don't see why should it be exactly the same.

(I got a 5000x37 matrix X, I do
[PC, SCORE, LATENT, TSQUARE] = PRINCOMP(X);
and get, say, the first 3 columns of SCORE
Then I do
[pn,meanp,stdp] = prestd(X');
[ptrans,transMat] = prepca(pn,y);
%y is a value such that ptrans has 3 variables.
I transpose SCORE then compare ptrans against SCORE(1:3,:) and it
isn't the same)

> Solve T*X = PC for T.

Do you mean I'm supposed to do T=X/PC being X my training input, so
that when I have got my test input Y, to transform such test input, I
must do
PC_test = T*Y ?
If I do that the matrix dimensions don't agree. I am not still clear
how to transform Y. What I am doing by now is the following:

I do
[pc, score, latent, tsquare] = PRINCOMP(X);
%X is my training input
score=score(:,1:c)
%c is the max number of columns such that score is not rank
%deficient
%score is the new reduced training input matrix

then I use such pc returned by princomp to transform Y (the test
input) the following way

SCORE = Y*pc;
SCORE = SCORE(:,1:c);
%SCORE is the new reduced test input matrix

So that's the way I do. If I'm wrong, please correct me.

Thanks...

From: Greg Heath on 27 Sep 2006 13:07

Gian Piero Bandieramonte wrote:
> > You are doing something wrong. It should be exactly the same.
>
> You mean the generalization properties should be exactly the same;
> the outputs returned by the network should be exactly the same on
> both cases??
> I don't see why should it be the same, try applying PCA on an
> ill-conditioned matrix (condition >>100) using princomp and
> prestd-prepca, the transformed matrixes in each cases have got
> different values,

That's because
1. PRINCOMP automatically centers the data but does not
scale it to have unit variance
2. The columns (or rows) of the transformation matrices
are eigenvectors of the covariance matrix. Therefore each
column(row) can differ by a sign.

Consider the following and compare the eigenvector matrices
E,V,E2,E3 and E4:

clear, close all, clc

load hald
X = hald
[n p] = size(X)
rankX = rank(X)
condX = cond(X)
M = repmat(mean(X),n,1)
Z = (X-M)/sqrt(n-1)

covX = cov(X)
err1 = max(abs(covX-Z'*Z))

[E L] = eigs(covX)
err2 = max(abs(eye(p)-E*E'))
err3 = max(abs(eye(p)-E'*E))
err4 = max(abs(covX-E*L*E'))

[U S V] = svd(Z,0)

err5 = max(abs(L-S.^2))
err6 = max(abs(eye(p)-V'*V))
err7 = max(abs(eye(p)-V*V'))
err8 = max(abs(eye(p)-U'*U))
err9 = max(abs(Z-U*S*V'))

[E2,Xt2,L2] = princomp(X)
err10 = max(abs(Xt2-(X-M)*E2))

[E3,Xt3,L3] = princomp(X')
M1 = repmat(mean(X'),p,1)
err11 = max(abs(Xt3-(X'-M1)*E3))

[Xt4,E4] = prepca((X-M)',0)
err12 = max(abs(Xt4-E4*(X-M)'))

Hope this helps.

Greg

> so the training of the network wouldn't be the
> same, meaning the the outputs simulated wouldn't also be the same. So
> I don't see why should it be exactly the same.
>
> (I got a 5000x37 matrix X, I do
> [PC, SCORE, LATENT, TSQUARE] = PRINCOMP(X);
> and get, say, the first 3 columns of SCORE
> Then I do
> [pn,meanp,stdp] = prestd(X');
> [ptrans,transMat] = prepca(pn,y);
> %y is a value such that ptrans has 3 variables.
> I transpose SCORE then compare ptrans against SCORE(1:3,:) and it
> isn't the same)
>
> > Solve T*X = PC for T.
>
> Do you mean I'm supposed to do T=X/PC being X my training input, so
> that when I have got my test input Y, to transform such test input, I
> must do
> PC_test = T*Y ?
> If I do that the matrix dimensions don't agree. I am not still clear
> how to transform Y. What I am doing by now is the following:
>
> I do
> [pc, score, latent, tsquare] = PRINCOMP(X);
> %X is my training input
> score=score(:,1:c)
> %c is the max number of columns such that score is not rank
> %deficient
> %score is the new reduced training input matrix
>
> then I use such pc returned by princomp to transform Y (the test
> input) the following way
>
> SCORE = Y*pc;
> SCORE = SCORE(:,1:c);
> %SCORE is the new reduced test input matrix
>
> So that's the way I do. If I'm wrong, please correct me.
>
> Thanks...

From: Gian Piero Bandieramonte on 27 Sep 2006 16:24

> Consider the following and compare the eigenvector matrices
> E,V,E2,E3 and E4:
>
> clear, close all, clc
>
> load hald
> X = hald
> [n p] = size(X)
> rankX = rank(X)
> condX = cond(X)
> M = repmat(mean(X),n,1)
> Z = (X-M)/sqrt(n-1)
>
> covX = cov(X)
> err1 = max(abs(covX-Z'*Z))
>
> [E L] = eigs(covX)
> err2 = max(abs(eye(p)-E*E'))
> err3 = max(abs(eye(p)-E'*E))
> err4 = max(abs(covX-E*L*E'))
>
> [U S V] = svd(Z,0)
>
> err5 = max(abs(L-S.^2))
> err6 = max(abs(eye(p)-V'*V))
> err7 = max(abs(eye(p)-V*V'))
> err8 = max(abs(eye(p)-U'*U))
> err9 = max(abs(Z-U*S*V'))
>
> [E2,Xt2,L2] = princomp(X)
> err10 = max(abs(Xt2-(X-M)*E2))
>
> [E3,Xt3,L3] = princomp(X')
> M1 = repmat(mean(X'),p,1)
> err11 = max(abs(Xt3-(X'-M1)*E3))
>
> [Xt4,E4] = prepca((X-M)',0)
> err12 = max(abs(Xt4-E4*(X-M)'))

Yoy apply princomp to X, and prepca to (X-M), I don't see how can I
compare them if the input to the functions are different on each of
the cases. It's supposed that if you do
[Xt4,E4] = prepca((X)',0)
then E2 and E4 on their diagonals should be the same (except for some
signs). But eventhough, I thought E2 and E4 should be the same not
only on their diagonals, but on all their values, i'm a bit confused.
I can't see exactly what are you trying to tell me with your code,
could you please detail me the important points?

From: Greg Heath on 28 Sep 2006 02:08

Gian Piero Bandieramonte wrote:
> > Consider the following and compare the eigenvector matrices
> > E,V,E2,E3 and E4:
> >
> > clear, close all, clc
> >
> > load hald
> > X = hald
> > [n p] = size(X)
> > rankX = rank(X)
> > condX = cond(X)
> > M = repmat(mean(X),n,1)
> > Z = (X-M)/sqrt(n-1)
> >
> > covX = cov(X)
> > err1 = max(abs(covX-Z'*Z))
> >
> > [E L] = eigs(covX)
> > err2 = max(abs(eye(p)-E*E'))
> > err3 = max(abs(eye(p)-E'*E))
> > err4 = max(abs(covX-E*L*E'))
> >
> > [U S V] = svd(Z,0)
> >
> > err5 = max(abs(L-S.^2))
> > err6 = max(abs(eye(p)-V'*V))
> > err7 = max(abs(eye(p)-V*V'))
> > err8 = max(abs(eye(p)-U'*U))
> > err9 = max(abs(Z-U*S*V'))
> >
> > [E2,Xt2,L2] = princomp(X)
> > err10 = max(abs(Xt2-(X-M)*E2))
> >
> > [E3,Xt3,L3] = princomp(X')
> > M1 = repmat(mean(X'),p,1)
> > err11 = max(abs(Xt3-(X'-M1)*E3))
> >
> > [Xt4,E4] = prepca((X-M)',0)
> > err12 = max(abs(Xt4-E4*(X-M)'))
>
> Yoy apply princomp to X, and prepca to (X-M), I don't see how can I
> compare them if the input to the functions are different on each of
> the cases.

Read the documentation.

PRINCOMP automatically centers the data. Therefore, within the
function, the equivalent of X = X-M is performed.

> It's supposed that if you do
> [Xt4,E4] = prepca((X)',0)
> then E2 and E4 on their diagonals should be the same (except for some
> signs).

What are you talking about? You didn't even run the code that
I painstakingly wrote.

> But eventhough, I thought E2 and E4 should be the same not
> only on their diagonals, but on all their values, i'm a bit confused.
> I can't see exactly what are you trying to tell me with your code,
> could you please detail me the important points?

1. Run the damn code.
2. Compare E,V,E2,E3 and E4
3. If you don't get it then, ask questions.

Greg

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: A bug in avifile?
Next: Parallel port