PCA [Matlab]

Prev: A bug in avifile?
Next: Parallel port

From: Gian Piero Bandieramonte on 30 Aug 2006 09:56

There is something that is unclear for me about the function
princomp. I have a 252*37 matrix, and want to reduce it, because 37
columns or variables is too much. So I want to know how to use the
outputs of princomp, that are 4, in a way that I can obtain the same
matrix, but in a more compact way, still having the same meaning.

I did somethig, and want to know if it's correct, otherwise please
correct me. The 2nd output of princomp returns the same matrix, but
with some transformation applied to it. So I used the 3rd output of
princomp to see the variances, and saw that from line 23 to the end,
the variances are zero. So what I did is to get the first 22 columns
of the matrix returned by princomp in its 2nd output and use this
matrix instead of my original matrix. I am training neural networks,
so I would train my network with this reduced matrix instead of using
the original one. So, will this reduced matrix mean exactly the same
as the original one, but in a more compact way? Would the training
process be distorted if using this reduced matrix instead of the
original one? If I want to simulate my network with a batch of
inputs, do I have to do the same process to this batch (transform the
37-variable batch to 22-variable)?

From: Greg Heath on 30 Aug 2006 12:42

Gian Piero Bandieramonte wrote:
> There is something that is unclear for me about the function
> princomp. I have a 252*37 matrix, and want to reduce it, because 37
> columns or variables is too much. So I want to know how to use the
> outputs of princomp, that are 4, in a way that I can obtain the same
> matrix, but in a more compact way, still having the same meaning.
>
> I did somethig, and want to know if it's correct, otherwise please
> correct me. The 2nd output of princomp returns the same matrix, but
> with some transformation applied to it. So I used the 3rd output of
> princomp to see the variances, and saw that from line 23 to the end,
> the variances are zero.

Probably too conservative. There are several common methods
based on fractions of either the total variance, TV = sum(V), or
the maximium variance Vmax = max(V).

1. Choose a large fraction, say 0.99. Then discard p-r components
so that
sum(V(1:r-1) < 0.99*TV <= sum(V(1:r))

2. Choose a small fraction, say 0.01. Then discard components
with
a. variances < 0.01*Vmax
or
b. variances < 0.01*TV

I use method 1.

> So what I did is to get the first 22 columns
> of the matrix returned by princomp in its 2nd output and use this
> matrix instead of my original matrix. I am training neural networks,
> so I would train my network with this reduced matrix instead of using
> the original one. So, will this reduced matrix mean exactly the same
> as the original one, but in a more compact way?

It depends what you mean by "mean" (A picture of Bill Clinton
just flashed before my eyes).

The reduced matrix accounts for, say, 99% of the variation
in the original data. This is usually good for regression.
However, PCA may not be good for classification.

> Would the training
> process be distorted if using this reduced matrix instead of the
> original one?

If you are starting out with only 99% of the input variance
you have to expect some distortion of the output. Usually
it is insignificant. You can always check by repeating
with more components and a higher % variance.

> If I want to simulate my network with a batch of
> inputs, do I have to do the same process to this batch (transform the
> 37-variable batch to 22-variable)?

You have to use the transformation matrix from the 1st batch
to transform the second batch (instead of performing PCA on the
second batch).

I use eigs(corcoeff(X)) for PCA (instead of princomps), so
I don't know if the transformation matrix is available to you
without solving T*X = PC using T = X/PC.

Hope this helps.

Greg

From: Greg Heath on 30 Aug 2006 12:50

From: Gian Piero Bandieramonte on 30 Aug 2006 13:21

> It depends what you mean by "mean" (A picture of Bill Clinton
> just flashed before my eyes).
>
> The reduced matrix accounts for, say, 99% of the variation
> in the original data. This is usually good for regression.
> However, PCA may not be good for classification.

My problem is of classification, I'm using PCA and you said PCA may
not be good for classification problems. What do you mean by "may not
be good"?

> You have to use the transformation matrix from the 1st batch
> to transform the second batch (instead of performing PCA on the
> second batch).
>
> I use eigs(corcoeff(X)) for PCA (instead of princomps), so
> I don't know if the transformation matrix is available to you
> without solving T*X = PC using T = X/PC.

I instead of using the transformation matrix, I applied PCA the same
way to the second batch and simulated my network. Does this fact make
my simulation go wrong?

Thanks for your help...

From: Greg Heath on 31 Aug 2006 09:00

Gian Piero Bandieramonte wrote:
> > It depends what you mean by "mean" (A picture of Bill Clinton
> > just flashed before my eyes).
> >
> > The reduced matrix accounts for, say, 99% of the variation
> > in the original data. This is usually good for regression.
> > However, PCA may not be good for classification.
>
> My problem is of classification, I'm using PCA and you said PCA may
> not be good for classification problems. What do you mean by "may not
> be good"?

Consider two thin 3-D disk shaped distributions parallel to the x-y
plane with radii a = 25, thicknesses t = 1, and edges separated
a distance d = 1 in the z direction.

Class 1; x^2+y^2 <= 625, 0.5 <= z <= 1.5
Class 2; x^2+y^2 <= 625, -1.5 <= z <= -0.5

The variance is

V = [ a^2/(8*pi), a^2/(8*pi), ((t+d/2)^3-(d/2)^3)/(3*t) ]
= [ 24.9, 24.9, 1.08 ]

With a variance threshold of ~98%, dominant PCA will choose the
directions of maximum spread x and y which are orthogonal to the
direction of class separation, z.

In this scenario you have to keep *all* of the PCs in order
to prevent a 100% overlap of the classes.

PCA is typically used when the dimensionality is high. For a
higher dimensional disk the same conclusion would result for
a lower a/t ratio.

> > You have to use the transformation matrix from the 1st batch
> > to transform the second batch (instead of performing PCA on the
> > second batch).
> >
> > I use eigs(corcoeff(X)) for PCA (instead of princomps), so
> > I don't know if the transformation matrix is available to you
> > without solving T*X = PC using T = X/PC.
>
> I instead of using the transformation matrix, I applied PCA the same
> way to the second batch and simulated my network. Does this fact make
> my simulation go wrong?

My advice is equivalent to adding a linear first hidden layer with
weight matrix T determined from training data. Then, freezing the net
before testing.

Your method is equivalent to changing the first hidden layer( only!)
every time a new set of data is to be classified.

Which sounds more reasonable to you?

Hope this helps.

Greg

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: A bug in avifile?
Next: Parallel port