PCA dimensionality reduction / component identifications [Matlab]

Prev: kmeans error message
Next: variable size inputs/outputs

From: Rob Campbell on 20 Jul 2010 11:30

> > So you want to know how much each of the original dimensions contributes to a given eigenvector?
>
> Not exactly. I want to know how much each of my original dimensions contributes to the complete set of dimensions. From a list of sorted(!) eigenvalues I guess I want be able to see it.

You're saying you want to know how much each of the original dimensions contributes to the complete set eigenvectors? I don't understand what you're asking for.

I was trying to break down the question: If you can calculate the contribution of your original dimensions to one PC then you can easily do this for any number of PCs. Isn't this what you want to know?

Can you explain what you want to know. Maybe the reason that nobody can help you is because there's a better way of addressing your underlying question.

From: Rob Campbell on 20 Jul 2010 11:35

>The reduction itself is not the problem, but I could not figure out, how to indentify the
>original features, that are actually important components and the one that are not
>important.

I see, this is what you want to know? Why not look at the direction of the eigenvectors? Plot the eigenvectors: the resulting "shapes" will tell you what features of your data they explain.

From: Philip Mewes on 20 Jul 2010 11:36

> or you may want to look at the PC
> coefficients (ditto) to try and identify a small subset of the original
> variables that accounts for a suitable proportion of the total variance.

that's exactly what I want to do, but I haven't understood yet how pca can help me with this. If you have a look at my initial post. I introduced 3 vectors (vec1,vec2,vec3) each of them contains 100 observations. Let's say because of computational cost I would like to spare a number of this vectors. Let’s also assume, that I don't care about how many vectors are spared out (1 or 2) but it matters how much % the remaining variables represents of my initial number of variables. How would I do that?

Do I need to evaluate how much each of the original dimensions contributes to a given eigenvector, like Rob proposed in a previous post?

For me it is especially important, that I must do this pca step only once and as a result I know which vector (vec1,vec2,vec3) i can spare in the future. I assume and I know, that the characteristics (var, mean, std, etc.) will always be more or less the same for upcoming vectors. So for a new set of vectors, which is very similar to one I did the pca with I want to say, which of them is important, and which one is not.

I thinks to get knowledge it is not sufficient to look at the variances, because they do not take into account the correlations between the vectors.

From: Philip Mewes on 20 Jul 2010 11:47

Hi Rob,

thanks for your help. I just replied to peters post explaining, what my motivations. Does that help you to understand my problem?
I also can say a bit more about the application: It's about image processing.Say, you have an image and you want to classify this image in two classes:
1.: Image contains a car
2.: Image contains no car

So you extract from your image a big number of features that might help you to address this classification step. Some of the features might be very useful, some of them a redundant and this is exactly what I want to find out and I think in principal pca is the way to do go. I want to find this out in a way that for future images, that I can spare some of the feature extraction steps, because I know, that these features are not important.

Does it became a bit more clear?

"Rob Campbell" <matlab(a)robertREMOVEcampbell.removethis.co.uk> wrote in message <i24fee$gur$1(a)fred.mathworks.com>...
>
> > > So you want to know how much each of the original dimensions contributes to a given eigenvector?
> >
> > Not exactly. I want to know how much each of my original dimensions contributes to the complete set of dimensions. From a list of sorted(!) eigenvalues I guess I want be able to see it.
>
> You're saying you want to know how much each of the original dimensions contributes to the complete set eigenvectors? I don't understand what you're asking for.
>
> I was trying to break down the question: If you can calculate the contribution of your original dimensions to one PC then you can easily do this for any number of PCs. Isn't this what you want to know?
>
> Can you explain what you want to know. Maybe the reason that nobody can help you is because there's a better way of addressing your underlying question.

From: Rob Campbell on 20 Jul 2010 12:11

Generally people will:
1. Do PCA.
2. Choose a suitable number of dimensions based upon the eigenvalues.
3. Reconstruct original observations using this reduced space. (see help pcares)

But it seems that you want to know which of the original dimensions are relatively unimportant so that you don't have to spend time acquiring them in the future. Right?

> Do I need to evaluate how much each of the original dimensions contributes to a
>given eigenvector, like Rob proposed in a previous post?
If you want to do what you state then it seems that you might want to know this. I think you would want to decide how many PCs you need to describe your data. Then work out how much each of the original dimensions contributes to each of these PCs. Add up these numbers and try excluding those dimensions which explain the least. I think you need a statistical test to determine whether or not removing a dimension results in a significantly worse description.

Do you have any key descriptive statistics which you calculate on your data? Perhaps a standard deviation or maybe your data fall into different clusters? If so, one possibility is to calculate this statistic on the full space then drop dimensions in a systematic manner and re-calculate. In other words, a step-wise regression.

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: kmeans error message
Next: variable size inputs/outputs