From: Peter Perkins on
On 3/30/2010 6:06 PM, Walter Roberson wrote:

> Also, the FEX contribution requires the Statistics Toolbox, whereas it
> appears that pcacov() is part of basic Matlab.

Walter, PCACOV, like PRINCOMP and FACTORAN, is part of the Statistics Toolbox. SVD and EIG, which PRINCOMP and PCACOV rely on, are in core MATLAB.
From: Ronald on
Peter Perkins <Peter.Perkins(a)MathRemoveThisWorks.com> wrote in message <hovog7$l4j$1(a)fred.mathworks.com>...
> On 3/30/2010 8:43 PM, Ronald wrote:
> > It is difficult to comprehend how MATLAB has overlooked a
> > primary step in the performance of Factor Analysis or PCA (they are two
> > different data extraction methods). I guess, I will write the script.
>
> Ronald, You probably know this already, but it seems that what you are looking for is this short function, where X is a data matrix:
>
> function [msa,msaOverall] = kmsa(X)
> rsq = sum(corr(X).^2,1) - 1;
> rpsq = sum(partialcorr(X).^2,1) - 1;
> msa = rsq ./ (rsq + rpsq);
> msaOverall = sum(rsq) ./ (sum(rsq + rpsq));
>
> The above requires the Statistics Toolbox, which I assume you have. Unfortunately, the above use of PARTIALCORR relies on an enhancement (computing the partial correlations within a single matrix, as opposed to controlling for variables in a second matrix) which is planned for an upcoming release. In the meantime, this
>
> RP = eye(m);
> for i = 1:m
> for j = 1:i-1
> k = setdiff(1:m,[i j]);
> RP(i,j) = partialcorr(X(:,i),X(:,j),X(:,k)); RP(j,i) = RP(i,j);
> end
> end
>
> can be substituted.
>
> There is another version of a function to compute this statistic on the MATLAB Central File Exchange
>
> <http://www.mathworks.com/matlabcentral/fileexchange/12736-kmo>
>
> which I guess you have already seen. You have mentioned that you do not want to use a "custom script"; I'm not sure if that's because you are looking for something that is formally supported, or if you think that a "script" is qualitatively different than something that is included in the Statistics Toolbox. In fact, almost all the functions in the Statistics Toolbox are written in the MATLAB language, and so are not qualitatively different than the above code or the above link on the FEX.
>
> Hope this helps.
>
> - Peter Perkins
> The MathWorks, Inc.

Your input is greatly appreciated. MATLAB Statistics Toolbox does not follow the logical steps required by academia or by journals publishing research results for PCA or Factor Analysis. First the data is analyzed for factorability. This does not exist in the Statistics Toolbox. If factorable, a full set of descriptive statistics plus scatter plots, and histogram are performed. Then, eigenvalues and eigenvectors are computed. A matrix of eigenvalues, percent of explained variance, and cummulative percent of explained variance is computed for the eigenvalues. The eigenvector matrix is transposed and joined with the eigenvalue, percent of explained variance, and cummulative percent matrix. Next, a Cattell's Scree Plot is run on the new matrix to verify factors or components visually (this doesn't even exist in the Statistics Toolbox). Components (or factors if factor analysis) are retained
based upon the proportion of variance accounted for by the eigenvalues. Last, the components (or factors if factor analysis) that were selected in the last step are rotated producing a factor pattern matrix. Factor loading scores, standard factor scores, and the correlation between rotated factors and original variables are reported. The last step will be to assign scores to the survey instrument so that other researchers can compare their resulting scores to those assigned to the instrument previously.

I cannot recommend MATLAB for use in PCA or Factor Analysis. It is light years behind JMP, SPSS, and SAS.

Thank you, again, for your kind assistance.
From: Peter Perkins on
On 3/31/2010 4:52 PM, Ronald wrote:
> Your input is greatly appreciated. MATLAB Statistics Toolbox does not
> follow the logical steps required by academia or by journals publishing
> research results for PCA or Factor Analysis.

Ronald, I appreciate you taking the time and effort to describe all of that. I don't think I'm going to be able to convince you, and I won't attempt to persuade beyond making this post, but I believe that all of what you've described (with the exception of the MSA) can be computed from the output of existing functions in the Statistics Toolbox.

First, I think that some of the steps you've described are only appropriate for the forms of "factor/component analysis" that are based on PCA (i.e., PCA itself, or PFA). In contrast, the FACTORAN function in the Statistics Toolbox fits a factor analysis model using maximum likelihood, and some of those steps just don't apply. For example, eigenvalues don't enter into the fit in the way they do in PCA, and although you can estimate the percent of variance explained by each factor (in terms of sums of squared loadings), they are not ordered in the same way that the eigenvalues in PCA are. I know that in the social science literature, FA and PCA are regarded as very similar methods, and lines between them are often blurred. The Statistics Toolbox takes more of a statistician's view of PCA, i.e. that it is simply a coordinate rotation that can conveniently be used for dimension reduction, and no statistical model is implied. I suspect we just won't see eye to eye on that.
But there's nothing preventing you from using PRINCOMP and ROTATEFACTORS (and perhaps PCARES) to go through the steps you've described -- with the exception of the MSA, I believe the quantities you describe are all there. Eigenvalues, loadings/eigenvectors, pattern matrix, and scores are explicit outputs, and correlations are just calls to another function.

A scree plot of variance percentages is simply

[loadings,scores,evals] = princomp(X);
pctVar = evals/ sum(evals);
plotyy(1:p,pctVar,1:p,cumsum(pctVar))

As for FACTORAN, I believe that steps you've described that make sense for maximum likelihood FA (as opposed to, say, PFA) can similarly all be done with the outputs from FACTORAN and ROTATEFACTORS.

It is true that there is no single command that will take you through all the steps you laid out, and there are some additional lines of code required, but customers typically use MATLAB as more of a set of building blocks that they put together in ways specific to their needs. It's also true that the documentation for FACTORAN is not really geared to people who come from the social sciences. But it is compatible with maximum likelihood FA as described in what I believe is a standard reference, Harman's Modern Factor Analysis (a book that, by the way, makes no mention of Kaiser's MSA).

In any case, I appreciate your comments, and hope I can learn from them to improve what we provide.

- Peter Perkins
The MathWorks, Inc.