PCA [Matlab]

Prev: A bug in avifile?
Next: Parallel port

From: Gian Piero Bandieramonte on 1 Sep 2006 14:58

> Consider two thin 3-D disk shaped distributions parallel to the x-y
> plane with radii a = 25, thicknesses t = 1, and edges separated
> a distance d = 1 in the z direction.
>
> Class 1; x^2+y^2 <= 625, 0.5 <= z <= 1.5
> Class 2; x^2+y^2 <= 625, -1.5 <= z <= -0.5
>
> The variance is
>
> V = [ a^2/(8*pi), a^2/(8*pi), ((t+d/2)^3-(d/2)^3)/(3*t) ]
> = [ 24.9, 24.9, 1.08 ]
>
> With a variance threshold of ~98%, dominant PCA will choose the
> directions of maximum spread x and y which are orthogonal to the
> direction of class separation, z.
> In this scenario you have to keep *all* of the PCs in order
> to prevent a 100% overlap of the classes.
>
> PCA is typically used when the dimensionality is high. For a
> higher dimensional disk the same conclusion would result for
> a lower a/t ratio.

I see that this scenario is what makes the use of PCA on
classification problems not appropriate. But is it only this scenario
that makes inappropriate the use of PCA with classification problems?
I got a 37-dimensional data, not a 3D data, and it is really tough to
see if this scenario is happening to me. I don't know if there are
matlab tools to assit me on this, or some theories....

Greg said that if my test set is sufficiently large, then I could
apply PCA with no correctness problems. My test data has been really
large enough so there is no problem with it. But now another problem
arises: if I now want to simulate my network with new data with no
known outputs, and this data only has one row, then what should I do?
Evidently this data is too small as to apply PCA (and a matlab error
appears on princomp if finding the pca's with data of only one row),
so should I apply some transformation matrix or something? Where do I
obtain that matrix?

From: Gian Piero Bandieramonte on 1 Sep 2006 14:58

From: Greg Heath on 2 Sep 2006 14:56

Greg Heath wrote:
> Gian Piero Bandieramonte wrote:
> > > It depends what you mean by "mean" (A picture of Bill Clinton
> > > just flashed before my eyes).
> > >
> > > The reduced matrix accounts for, say, 99% of the variation
> > > in the original data. This is usually good for regression.
> > > However, PCA may not be good for classification.
> >
> > My problem is of classification, I'm using PCA and you said PCA may
> > not be good for classification problems. What do you mean by "may not
> > be good"?
>
> Consider two thin 3-D disk shaped distributions parallel to the x-y
> plane with radii a = 25, thicknesses t = 1, and edges separated
> a distance d = 1 in the z direction.
>
> Class 1; x^2+y^2 <= 625, 0.5 <= z <= 1.5
> Class 2; x^2+y^2 <= 625, -1.5 <= z <= -0.5
>
> The variance is
>
> V = [ a^2/(8*pi), a^2/(8*pi), ((t+d/2)^3-(d/2)^3)/(3*t) ]
> = [ 24.9, 24.9, 1.08 ]
> With a variance threshold of ~98%,

WHOOPS!

V = [ a^2/4, a^2/4, ((t+d/2)^3-(d/2)^3)/(3*t) ]
= [ 156.25, 156.25, 1.08 ] for a = 25

= [ 25, 25, 1.08] for a = 10

Hope this helps.

Greg

> dominant PCA will choose the
> directions of maximum spread x and y which are orthogonal to the
> direction of class separation, z.
>
> In this scenario you have to keep *all* of the PCs in order
> to prevent a 100% overlap of the classes.
>
> PCA is typically used when the dimensionality is high. For a
> higher dimensional disk the same conclusion would result for
> a lower a/t ratio.
>
> > > You have to use the transformation matrix from the 1st batch
> > > to transform the second batch (instead of performing PCA on the
> > > second batch).
> > >
> > > I use eigs(corcoeff(X)) for PCA (instead of princomps), so
> > > I don't know if the transformation matrix is available to you
> > > without solving T*X = PC using T = X/PC.
> >
> > I instead of using the transformation matrix, I applied PCA the same
> > way to the second batch and simulated my network. Does this fact make
> > my simulation go wrong?
>
> My advice is equivalent to adding a linear first hidden layer with
> weight matrix T determined from training data. Then, freezing the net
> before testing.
>
> Your method is equivalent to changing the first hidden layer( only!)
> every time a new set of data is to be classified.
>
> Which sounds more reasonable to you?
>
> Hope this helps.
>
> Greg

From: Greg Heath on 2 Sep 2006 15:35

Gian Piero Bandieramonte wrote:
> > Consider two thin 3-D disk shaped distributions parallel to the x-y
> > plane with radii a = 25, thicknesses t = 1, and edges separated
> > a distance d = 1 in the z direction.
> >
> > Class 1; x^2+y^2 <= 625, 0.5 <= z <= 1.5
> > Class 2; x^2+y^2 <= 625, -1.5 <= z <= -0.5
> >
> > The variance is
> >
> > V = [ a^2/(8*pi), a^2/(8*pi), ((t+d/2)^3-(d/2)^3)/(3*t) ]
> > = [ 24.9, 24.9, 1.08 ]
> >
> > With a variance threshold of ~98%, dominant PCA will choose the
> > directions of maximum spread x and y which are orthogonal to the
> > direction of class separation, z.
> > In this scenario you have to keep *all* of the PCs in order
> > to prevent a 100% overlap of the classes.
> >
> > PCA is typically used when the dimensionality is high. For a
> > higher dimensional disk the same conclusion would result for
> > a lower a/t ratio.
>
> I see that this scenario is what makes the use of PCA on
> classification problems not appropriate.

This type of scenario is what makes the use of PCA
inappropriate for *some* classification problems.

When the total covariance matrix is split into the sum
of "within-class" and "between-class" components,
dominant PCA dimensionality reduction tends to be
inappropriate for classification when the "within-class"
component is dominant.

Classical Linear Discriminant Analysis (LDA) is based
on finding the directions that maximize the ratio of
"between-class" and "within-class" variance
components.

> But is it only this scenario
> that makes inappropriate the use of PCA with classification problems?
> I got a 37-dimensional data, not a 3D data, and it is really tough to
> see if this scenario is happening to me. I don't know if there are
> matlab tools to assit me on this, or some theories....

You still haven't indicated the type of classifier you are using.

> Greg said that if my test set is sufficiently large, then I could
> apply PCA with no correctness problems. My test data has been really
> large enough so there is no problem with it.

How large is it?

> But now another problem
> arises: if I now want to simulate my network

Is this a neural network model?

> with new data with no
> known outputs, and this data only has one row, then what should I do?
> Evidently this data is too small as to apply PCA (and a matlab error
> appears on princomp if finding the pca's with data of only one row),
> so should I apply some transformation matrix or something? Where do I
> obtain that matrix?

That is exactly what I've been talking about!

This is an insufficiently large test set for your method to deal with.

Reread my original post. Also read my post on pretraining advice.

Hope this helps.

Greg

From: Gian Piero Bandieramonte on 4 Sep 2006 14:00

>> But is it only this scenario
>> that makes inappropriate the use of PCA with classification
problems?
>> I got a 37-dimensional data, not a 3D data, and it is really
tough to
>> see if this scenario is happening to me. I don't know if there
are
>> matlab tools to assit me on this, or some theories....

>You still haven't indicated the type of classifier you are using.

The type of classifier I'm using is RBF (Radial Basis Networks) ,
specifically using the function newrb.

>> Greg said that if my test set is sufficiently large, then I
could
>> apply PCA with no correctness problems. My test data has been
really
>> large enough so there is no problem with it.

> How large is it?

Well, the mean of the sizes of the test sets is aprox 3000 rows. I
hope this is large enough....

>> But now another problem
>> arises: if I now want to simulate my network

> Is this a neural network model?

Yes it is a neural network model.

> That is exactly what I've been talking about!
>
> This is an insufficiently large test set for your method to deal
> with.
>
> Reread my original post. Also read my post on pretraining advice.
>
> Hope this helps.
>
> Greg

I first test my network with many test sets, of which the correct
outputs are known, and of which the mean of those sets is 3000 (as I
mentioned earlier on this reply). Then I simulate the network with
other test set, of which whom the size is 1. As you said, it is too
small and need to apply some transform, and you on other replies
explained

"You have to use the transformation matrix from the 1st batch
to transform the second batch (instead of performing PCA on the
second batch).

I use eigs(corcoeff(X)) for PCA (instead of princomps), so
I don't know if the transformation matrix is available to you
without solving T*X = PC using T = X/PC."

I don't get well from where I obtain the transformation matrix. Does
eigs(corcoeff(X)) returns the transformation matrix? The corcoeff
function is in fact the corrcoef function of matlab to obtain the
correlation coefficients? Is X the 1st batch? Do I multiply the
transformation matrix to the second batch as to transform it?

Thanks for your help...

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: A bug in avifile?
Next: Parallel port