PCA [Matlab]

Prev: A bug in avifile?
Next: Parallel port

From: Greg Heath on 28 Sep 2006 02:54

Gian Piero Bandieramonte wrote:
> > Consider the following and compare the eigenvector matrices
> > E,V,E2,E3 and E4:
> >
> > clear, close all, clc
> >
> > load hald
> > X = hald
> > [n p] = size(X)
> > rankX = rank(X)
> > condX = cond(X)
> > M = repmat(mean(X),n,1)
> > Z = (X-M)/sqrt(n-1)
> >
> > covX = cov(X)
> > err1 = max(abs(covX-Z'*Z))
> >
> > [E L] = eigs(covX)
> > err2 = max(abs(eye(p)-E*E'))
> > err3 = max(abs(eye(p)-E'*E))
> > err4 = max(abs(covX-E*L*E'))
> >
> > [U S V] = svd(Z,0)
> >
> > err5 = max(abs(L-S.^2))
> > err6 = max(abs(eye(p)-V'*V))
> > err7 = max(abs(eye(p)-V*V'))
> > err8 = max(abs(eye(p)-U'*U))
> > err9 = max(abs(Z-U*S*V'))
> >
> > [E2,Xt2,L2] = princomp(X)
> > err10 = max(abs(Xt2-(X-M)*E2))
> >
> > [E3,Xt3,L3] = princomp(X')
> > M1 = repmat(mean(X'),p,1)
> > err11 = max(abs(Xt3-(X'-M1)*E3))
> >
> > [Xt4,E4] = prepca((X-M)',0)
> > err12 = max(abs(Xt4-E4*(X-M)'))
>
> Yoy apply princomp to X, and prepca to (X-M), I don't see how can I
> compare them if the input to the functions are different on each of
> the cases.

Reread my comment 1.

> It's supposed that if you do
> [Xt4,E4] = prepca((X)',0)
> then E2 and E4 on their diagonals should be the same (except for some
> signs).

No.

The data has to be centered for PCA.

Reread the documentation

help princomp
help prepca

Notice that PRINCOMP automatically centers the data
and PREPCA does not.

Therefore the data has to be centered before using PREPCA.

Consequently, the easiest way to compare the two is to use X
in PRINCOMP and (X-M) in PREPCA.

Notice that if PRESTD is used to center the data before using
PREPCA, the data is scaled in addition to being centered.

Therefore, to compare the two in this scenario, use Xn in PREPCA
and X./repmat(std(X),n,1) in PRINCOMP.

> But eventhough, I thought E2 and E4 should be the same not
> only on their diagonals, but on all their values, i'm a bit confused.

Reread my comment 2.

E,V,E2,E3 and E4 are equivalent eigenvector matrices, not
eigenvalue matrices.

> I can't see exactly what are you trying to tell me with your code,
> could you please detail me the important points?

Why don't you run the code, look at the printout, then tell me.

Hope this helps.

Greg

From: Gian Piero Bandieramonte on 29 Sep 2006 17:25

> No.
>
> The data has to be centered for PCA.
>
> Reread the documentation
>
> help princomp
> help prepca
>
> Notice that PRINCOMP automatically centers the data
> and PREPCA does not.
>
> Therefore the data has to be centered before using PREPCA.
>
> Consequently, the easiest way to compare the two is to use X
> in PRINCOMP and (X-M) in PREPCA.
>
> Notice that if PRESTD is used to center the data before using
> PREPCA, the data is scaled in addition to being centered.
>
> Therefore, to compare the two in this scenario, use Xn in PREPCA
> and X./repmat(std(X),n,1) in PRINCOMP.

Does that mean that if I use function prestd, then I must run
princomp X./repmat(std(X),n,1) and prepca with Xn, and if I don't use
prestd then I must run princomp with X and prepca with (X-M)?
So if, supposing prestd is not used, I trained a neural network with
an input processed with princomp using X, and I want to compare its
generalization properties with other neural network using prepca to
prepocess my input, I must preprocess input X substracting it M? (and
according to you, the generalization properties must be the same,
maybe that's why I got different generalization properties, because I
made the mistake of using prepca without susbtracting my input by M).

I also see that E and E2 are the same on all the values, only some
signs change. But E and E2 differ from E4 on all their values but
their diagonals, that differ from some signs. E was obtained by
calculating eigs(covX), and E4 is more different from E than E2 is.
So I wonder why isn't E4 equal to E on all it's values like E2
does(except for the signs). Will this fact affect the way prepca
processes the input? How can the fact that E2 and E4 are different ,
the data processed still is the same.

I saw the data processed by princomp and by prepca, and they are
exactly the same except from some signs, effectively because the
signs of E2 and E4 change. So I got different signs, and this is a
fact that alters my training, so which of both batches of data should
I use. I suppose it's the same thing....

Thanks.

From: Greg Heath on 1 Oct 2006 22:40

Gian Piero Bandieramonte wrote:
> > No.
> >
> > The data has to be centered for PCA.
> >
> > Reread the documentation
> >
> > help princomp
> > help prepca
> >
> > Notice that PRINCOMP automatically centers the data
> > and PREPCA does not.
> >
> > Therefore the data has to be centered before using PREPCA.
> >
> > Consequently, the easiest way to compare the two is to use X
> > in PRINCOMP and (X-M) in PREPCA.
> >
> > Notice that if PRESTD is used to center the data before using
> > PREPCA, the data is scaled in addition to being centered.
> >
> > Therefore, to compare the two in this scenario, use Xn in PREPCA
> > and X./repmat(std(X),n,1) in PRINCOMP.
>
> Does that mean that if I use function prestd, then I must run
> princomp X./repmat(std(X),n,1) and prepca with Xn,

No, you can use the fact that

PRINCOMP(X./repmat(std(X),n,1)) = PRINCOMP(Xn)

> and if I don't use
> prestd then I must run princomp with X and prepca with (X-M)?

Yes.

> So if, supposing prestd is not used, I trained a neural network with
> an input processed with princomp using X, and I want to compare its
> generalization properties with other neural network using prepca to
> prepocess my input, I must preprocess input X substracting it M? (and
> according to you, the generalization properties must be the same,
> maybe that's why I got different generalization properties, because I
> made the mistake of using prepca without susbtracting my input by M).

help prepca ==> PREPCA *requires* that the input be centered.
It is not an option.

> I also see that E and E2 are the same on all the values, only some
> signs change. But E and E2 differ from E4 on all their values but
> their diagonals, that differ from some signs. E was obtained by
> calculating eigs(covX), and E4 is more different from E than E2 is.
> So I wonder why isn't E4 equal to E on all it's values like E2
> does(except for the signs). Will this fact affect the way prepca
> processes the input? How can the fact that E2 and E4 are different ,
> the data processed still is the same.

Hint: Compare the expressions for err10 and err12. Then
compare E2 and E4 again.

> I saw the data processed by princomp and by prepca, and they are
> exactly the same except from some signs, effectively because the
> signs of E2 and E4 change. So I got different signs, and this is a
> fact that alters my training, so which of both batches of data should
> I use. I suppose it's the same thing....

Yes. It doesn't matter as long as you understand what you are doing.

Hope this helps.

Greg.

From: Gian Piero Bandieramonte on 2 Oct 2006 13:46

> No, you can use the fact that
> PRINCOMP(X./repmat(std(X),n,1)) = PRINCOMP(Xn)

I do
Xn=prestd(X);
[En,Xtn,Ln] = princomp(Xn);
[Egg,Xtgg,Lgg] = princomp(X./repmat(std(X),n,1));

and En is not equal to Egg...

I see that princomp automatically centers the data. But is it better
to run princomp with the data additionally scaled?

> Hint: Compare the expressions for err10 and err12. Then
> compare E2 and E4 again.

err10 and err12 are vectors with zero's... It doesn't tell me very
much about the difference between E2 and E4 with respect to E.

From: Greg Heath on 3 Oct 2006 02:41

Gian Piero Bandieramonte wrote:
> > No, you can use the fact that
> > PRINCOMP(X./repmat(std(X),n,1)) = PRINCOMP(Xn)
>
> I do
> Xn=prestd(X);
> [En,Xtn,Ln] = princomp(Xn);
> [Egg,Xtgg,Lgg] = princomp(X./repmat(std(X),n,1));
>
> and En is not equal to Egg...

Strange. I'll look into it.

> I see that princomp automatically centers the data. But is it better
> to run princomp with the data additionally scaled?

I always standardize my data. See my post on pretraining advice.

> > Hint: Compare the expressions for err10 and err12. Then
> > compare E2 and E4 again.
>
> err10 and err12 are vectors with zero's... It doesn't tell me very
> much about the difference between E2 and E4 with respect to E.

That's because you looked at the values.

I said look at the expressions.

Hope this helps.

Greg

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: A bug in avifile?
Next: Parallel port