From: eyeballjunk junk on
Hey Peter, another question;

My intuition is that running the analysis on zscores will not change the returned eigenvectors, but will only change the eigenvalues, and thus the order in which the eigenevectors are returned.


Peter Perkins <Peter.PerkinsRemoveThis(a)mathworks.com> wrote in message <fhi17f$dhh$1(a)fred.mathworks.com>...
> yakir gagnon wrote:
> >>> and princomp(zscore( X )) is a CORRECT PCA...
> >> There is absolutely no point in doing this
> >
> > why? doing princomp(X) or princomp(zscore(X)) yields two different answers. and zscore(X) = zscore(zscore(X))
>
> Yes, princomp(X) and princomp(zscore(X)) do give different results. All
> I meant was that princomp(X./repmat(std(X,1),size(X,1),1)) and
> princomp(zscore(X)) will give the same results, because princomp already
> centers the data to have zero mean, and so the centering step in zscore
> is redundant. On the other hand, since it's easier to type zscore(X)
> than X./repmat(std(X,1),size(X,1),1), choosing the former does no harm.
>
>
> >> (as opposed to what you've
> >> called "correlation PCA"), since PRINCOMP already
> >> centers the data.
> >
> > here you say 'centre the data' which makes me confused since I thought you were talking about the zscoring (in which case I thought it was called standardizing), but I might be wrong.
>
> ZSCORE centers each column to have zero mean, and normalizes each column
> to have unit variance. "Standardized" is kind of an ambiguous term; the
> best description of what ZSCORE does is "type zscore".
>
> PRINCOMP always centers the data to have zero mean before doing
> anything. There's limited use in doing PCA on non-centered data,
> because the first component will typically describe the mean of the
> data, and that's not what most people want out of PCA (some would argue
> with that).
>
>
> > so why would I choose to do a so called "correlation PCA"? what is it good for?
>
> There are a lot of differing opinions on this. My own opinion is that
> doing PCA on unstandardized variables implies that you think that the
> scales on which the different variables are measured are somehow
> "natural" and "comparable", in the sense that variation of some absolute
> magnitude in one variable is no more or less important than the same
> amount of absolute variation in another variable. Doing PCA on
> standardized variables (scaling each column by the inverse of its sample
> std dev) implies that you think that the scales of the different
> variables are an artifact of the units in which you measured them, and
> that you need to rescale in order to make the variation in the different
> variables "comparable". The classic example is doing PCA on things like
> body measurements. Should your PCA results differ if you choose to
> measure weight in grams vs. stones? Probably it shouldn't.
>
> Whether or not you center the data before doing PCA affects these
> arguments too.
>
> I would not describe either as "correct", but would apply a method as
> appropriate to circumstances. Again some would argue with that.
>
> Hope this helps.
>
> - Peter Perkins
> The MathWorks, Inc.
From: eyeballjunk junk on
Excuse the double-post, but I accidentally sent an incomplete message.

My intuition is that running the analysis on zscores will not change the returned eigenvectors, but will only change the eigenvalues, and thus the order in which the eigenevectors are returned. Is this true?

- gD

"eyeballjunk junk" <eyeballjunk(a)gmail.com> wrote in message <hj2098$k08$1(a)fred.mathworks.com>...
> Hey Peter, another question;
>
> My intuition is that running the analysis on zscores will not change the returned eigenvectors, but will only change the eigenvalues, and thus the order in which the eigenevectors are returned.
>
>
> Peter Perkins <Peter.PerkinsRemoveThis(a)mathworks.com> wrote in message <fhi17f$dhh$1(a)fred.mathworks.com>...
> > yakir gagnon wrote:
> > >>> and princomp(zscore( X )) is a CORRECT PCA...
> > >> There is absolutely no point in doing this
> > >
> > > why? doing princomp(X) or princomp(zscore(X)) yields two different answers. and zscore(X) = zscore(zscore(X))
> >
> > Yes, princomp(X) and princomp(zscore(X)) do give different results. All
> > I meant was that princomp(X./repmat(std(X,1),size(X,1),1)) and
> > princomp(zscore(X)) will give the same results, because princomp already
> > centers the data to have zero mean, and so the centering step in zscore
> > is redundant. On the other hand, since it's easier to type zscore(X)
> > than X./repmat(std(X,1),size(X,1),1), choosing the former does no harm.
> >
> >
> > >> (as opposed to what you've
> > >> called "correlation PCA"), since PRINCOMP already
> > >> centers the data.
> > >
> > > here you say 'centre the data' which makes me confused since I thought you were talking about the zscoring (in which case I thought it was called standardizing), but I might be wrong.
> >
> > ZSCORE centers each column to have zero mean, and normalizes each column
> > to have unit variance. "Standardized" is kind of an ambiguous term; the
> > best description of what ZSCORE does is "type zscore".
> >
> > PRINCOMP always centers the data to have zero mean before doing
> > anything. There's limited use in doing PCA on non-centered data,
> > because the first component will typically describe the mean of the
> > data, and that's not what most people want out of PCA (some would argue
> > with that).
> >
> >
> > > so why would I choose to do a so called "correlation PCA"? what is it good for?
> >
> > There are a lot of differing opinions on this. My own opinion is that
> > doing PCA on unstandardized variables implies that you think that the
> > scales on which the different variables are measured are somehow
> > "natural" and "comparable", in the sense that variation of some absolute
> > magnitude in one variable is no more or less important than the same
> > amount of absolute variation in another variable. Doing PCA on
> > standardized variables (scaling each column by the inverse of its sample
> > std dev) implies that you think that the scales of the different
> > variables are an artifact of the units in which you measured them, and
> > that you need to rescale in order to make the variation in the different
> > variables "comparable". The classic example is doing PCA on things like
> > body measurements. Should your PCA results differ if you choose to
> > measure weight in grams vs. stones? Probably it shouldn't.
> >
> > Whether or not you center the data before doing PCA affects these
> > arguments too.
> >
> > I would not describe either as "correct", but would apply a method as
> > appropriate to circumstances. Again some would argue with that.
> >
> > Hope this helps.
> >
> > - Peter Perkins
> > The MathWorks, Inc.
From: Peter Perkins on
eyeballjunk junk wrote:
> Excuse the double-post, but I accidentally sent an incomplete message.
>
> My intuition is that running the analysis on zscores will not change the
> returned eigenvectors, but will only change the eigenvalues, and thus
> the order in which the eigenevectors are returned. Is this true?

If I understand your question correctly, then no. PCA on a cov matrix and on the corresponding cor matrix will in general give completely different results, because of the rescaling. For example, PCA finds "the direction of maximal variance" as the first PC, right? If the cov matrix is S1 = [1e6 1e2; 1e2 1], then the first PC is going to be almost entirely the first var. If the cov matrix is S2 = [1 .1; .1 1], i.e., the cor matrix corresponding to S1, then the first PC is going to be both vars in equal portions.

The point that I may have been trying to make in that thread was that ZSCORE centers the data, but PRINCOMP does that anyway, so the centering aspect of ZSCORE is a no-op for the purposes of calling PRINCOMP.