Prev: A bug in avifile?
Next: Parallel port
From: Gian Piero Bandieramonte on 30 Aug 2006 09:56 There is something that is unclear for me about the function princomp. I have a 252*37 matrix, and want to reduce it, because 37 columns or variables is too much. So I want to know how to use the outputs of princomp, that are 4, in a way that I can obtain the same matrix, but in a more compact way, still having the same meaning. I did somethig, and want to know if it's correct, otherwise please correct me. The 2nd output of princomp returns the same matrix, but with some transformation applied to it. So I used the 3rd output of princomp to see the variances, and saw that from line 23 to the end, the variances are zero. So what I did is to get the first 22 columns of the matrix returned by princomp in its 2nd output and use this matrix instead of my original matrix. I am training neural networks, so I would train my network with this reduced matrix instead of using the original one. So, will this reduced matrix mean exactly the same as the original one, but in a more compact way? Would the training process be distorted if using this reduced matrix instead of the original one? If I want to simulate my network with a batch of inputs, do I have to do the same process to this batch (transform the 37-variable batch to 22-variable)?
From: Greg Heath on 30 Aug 2006 12:42 Gian Piero Bandieramonte wrote: > There is something that is unclear for me about the function > princomp. I have a 252*37 matrix, and want to reduce it, because 37 > columns or variables is too much. So I want to know how to use the > outputs of princomp, that are 4, in a way that I can obtain the same > matrix, but in a more compact way, still having the same meaning. > > I did somethig, and want to know if it's correct, otherwise please > correct me. The 2nd output of princomp returns the same matrix, but > with some transformation applied to it. So I used the 3rd output of > princomp to see the variances, and saw that from line 23 to the end, > the variances are zero. Probably too conservative. There are several common methods based on fractions of either the total variance, TV = sum(V), or the maximium variance Vmax = max(V). 1. Choose a large fraction, say 0.99. Then discard p-r components so that sum(V(1:r-1) < 0.99*TV <= sum(V(1:r)) 2. Choose a small fraction, say 0.01. Then discard components with a. variances < 0.01*Vmax or b. variances < 0.01*TV I use method 1. > So what I did is to get the first 22 columns > of the matrix returned by princomp in its 2nd output and use this > matrix instead of my original matrix. I am training neural networks, > so I would train my network with this reduced matrix instead of using > the original one. So, will this reduced matrix mean exactly the same > as the original one, but in a more compact way? It depends what you mean by "mean" (A picture of Bill Clinton just flashed before my eyes). The reduced matrix accounts for, say, 99% of the variation in the original data. This is usually good for regression. However, PCA may not be good for classification. > Would the training > process be distorted if using this reduced matrix instead of the > original one? If you are starting out with only 99% of the input variance you have to expect some distortion of the output. Usually it is insignificant. You can always check by repeating with more components and a higher % variance. > If I want to simulate my network with a batch of > inputs, do I have to do the same process to this batch (transform the > 37-variable batch to 22-variable)? You have to use the transformation matrix from the 1st batch to transform the second batch (instead of performing PCA on the second batch). I use eigs(corcoeff(X)) for PCA (instead of princomps), so I don't know if the transformation matrix is available to you without solving T*X = PC using T = X/PC. Hope this helps. Greg
From: Greg Heath on 30 Aug 2006 12:50 Gian Piero Bandieramonte wrote: > There is something that is unclear for me about the function > princomp. I have a 252*37 matrix, and want to reduce it, because 37 > columns or variables is too much. So I want to know how to use the > outputs of princomp, that are 4, in a way that I can obtain the same > matrix, but in a more compact way, still having the same meaning. > > I did somethig, and want to know if it's correct, otherwise please > correct me. The 2nd output of princomp returns the same matrix, but > with some transformation applied to it. So I used the 3rd output of > princomp to see the variances, and saw that from line 23 to the end, > the variances are zero. Probably too conservative. There are several common methods based on fractions of either the total variance, TV = sum(V), or the maximium variance Vmax = max(V). 1. Choose a large fraction, say 0.99. Then discard p-r components so that sum(V(1:r-1) < 0.99*TV <= sum(V(1:r)) 2. Choose a small fraction, say 0.01. Then discard components with a. variances < 0.01*Vmax or b. variances < 0.01*TV I use method 1. > So what I did is to get the first 22 columns > of the matrix returned by princomp in its 2nd output and use this > matrix instead of my original matrix. I am training neural networks, > so I would train my network with this reduced matrix instead of using > the original one. So, will this reduced matrix mean exactly the same > as the original one, but in a more compact way? It depends what you mean by "mean" (A picture of Bill Clinton just flashed before my eyes). The reduced matrix accounts for, say, 99% of the variation in the original data. This is usually good for regression. However, PCA may not be good for classification. > Would the training > process be distorted if using this reduced matrix instead of the > original one? If you are starting out with only 99% of the input variance you have to expect some distortion of the output. Usually it is insignificant. You can always check by repeating with more components and a higher % variance. > If I want to simulate my network with a batch of > inputs, do I have to do the same process to this batch (transform the > 37-variable batch to 22-variable)? You have to use the transformation matrix from the 1st batch to transform the second batch (instead of performing PCA on the second batch). I use eigs(corcoeff(X)) for PCA (instead of princomps), so I don't know if the transformation matrix is available to you without solving T*X = PC using T = X\PC. Hope this helps. Greg
From: Gian Piero Bandieramonte on 30 Aug 2006 13:21 > It depends what you mean by "mean" (A picture of Bill Clinton > just flashed before my eyes). > > The reduced matrix accounts for, say, 99% of the variation > in the original data. This is usually good for regression. > However, PCA may not be good for classification. My problem is of classification, I'm using PCA and you said PCA may not be good for classification problems. What do you mean by "may not be good"? > You have to use the transformation matrix from the 1st batch > to transform the second batch (instead of performing PCA on the > second batch). > > I use eigs(corcoeff(X)) for PCA (instead of princomps), so > I don't know if the transformation matrix is available to you > without solving T*X = PC using T = X/PC. I instead of using the transformation matrix, I applied PCA the same way to the second batch and simulated my network. Does this fact make my simulation go wrong? Thanks for your help...
From: Greg Heath on 31 Aug 2006 09:00
Gian Piero Bandieramonte wrote: > > It depends what you mean by "mean" (A picture of Bill Clinton > > just flashed before my eyes). > > > > The reduced matrix accounts for, say, 99% of the variation > > in the original data. This is usually good for regression. > > However, PCA may not be good for classification. > > My problem is of classification, I'm using PCA and you said PCA may > not be good for classification problems. What do you mean by "may not > be good"? Consider two thin 3-D disk shaped distributions parallel to the x-y plane with radii a = 25, thicknesses t = 1, and edges separated a distance d = 1 in the z direction. Class 1; x^2+y^2 <= 625, 0.5 <= z <= 1.5 Class 2; x^2+y^2 <= 625, -1.5 <= z <= -0.5 The variance is V = [ a^2/(8*pi), a^2/(8*pi), ((t+d/2)^3-(d/2)^3)/(3*t) ] = [ 24.9, 24.9, 1.08 ] With a variance threshold of ~98%, dominant PCA will choose the directions of maximum spread x and y which are orthogonal to the direction of class separation, z. In this scenario you have to keep *all* of the PCs in order to prevent a 100% overlap of the classes. PCA is typically used when the dimensionality is high. For a higher dimensional disk the same conclusion would result for a lower a/t ratio. > > You have to use the transformation matrix from the 1st batch > > to transform the second batch (instead of performing PCA on the > > second batch). > > > > I use eigs(corcoeff(X)) for PCA (instead of princomps), so > > I don't know if the transformation matrix is available to you > > without solving T*X = PC using T = X/PC. > > I instead of using the transformation matrix, I applied PCA the same > way to the second batch and simulated my network. Does this fact make > my simulation go wrong? My advice is equivalent to adding a linear first hidden layer with weight matrix T determined from training data. Then, freezing the net before testing. Your method is equivalent to changing the first hidden layer( only!) every time a new set of data is to be classified. Which sounds more reasonable to you? Hope this helps. Greg |