proc mixed: multicolinearity [SAS]

Prev: DO While in Macro
Next: Which process lock a table on Win environment

From: Ching on 17 Jan 2010 03:13

Hi everyone.

I'm testing the possible multicollinearity among the fixed effects in
proc mixed model with the option corrb in the model statement but
wondering whether I should specify method=reml or method=ml?

Your help is greatly appreciated. Have a good week.

From: Dale McLerran on 22 Jan 2010 14:23

--- On Thu, 1/21/10, Ryan <ryan.andrew.black(a)GMAIL.COM> wrote:

> From: Ryan <ryan.andrew.black(a)GMAIL.COM>
> Subject: Re: proc mixed: multicolinearity
> To: SAS-L(a)LISTSERV.UGA.EDU
> Date: Thursday, January 21, 2010, 6:17 PM
> On Jan 19, 2:32 pm, stringplaye...(a)YAHOO.COM
> (Dale McLerran) wrote:
> > --- On Sun, 1/17/10, Peter Flom <peterflomconsult...(a)MINDSPRING.COM>
> wrote:
> >
> >
> > I suppose that one could construct an example where collinearity
> > statistics should be examined at the subject level. In general,
> > I would not worry about assessing collinearity within subjects.
> >
> > Mixed model parameter estimates for the fixed effects are obtained
> > by minimizing
> >
> > (Y - X*beta)' * V^(-1) * (Y - X*beta)
> >
> > where X is the covariate matrix for all subjects. Thus, we
> > are primarily concerned with collinearity that is manifest
> > across all subjects. The variance term in the middle is
> > obtained as
> >
> > V = Z*G*Z' + R
> >
> > where Z is the within-subjects design matrix. Now, the
> > efficiency of our estimate of beta depends on having a good
> > representation of V. To the extent that collinearity does
> > not affect the estimate of a predicted value but whether
> > we are able to correctly state the effect of each variable on
> > that estimate, we don't really care whether V is attributable
> > to column Z1, Z2, ..., Zk. As long as we get the estimate of
> > the between subject variance correct, then collinearity becomes
> > an issue only for the complete design matrix X.
> >
> > I would note that the solution for beta hat in mixed models
> > is actually no different than the solution for beta hat in a
> > logistic regression model or a Poisson regression model or
> > any of the other generalized linear models solutions. In all
> > of these, we obtain beta hat as
> >
> > beta hat = (X'*W*X)^(-1) * X'*W*Y
> >
> > where W=V^(-1) in the mixed models analysis.
> >
> > So, to the extent that one can use PROC REG to assess collinearity
> > for a generalized linear model, then one could use PROC REG to
> > assess collinearity in a mixed model setting. However, the
> > above solution can be rewritten as
> >
> > beta hat = (X'*W^(1/2) * W^(1/2)*X)^(-1)
> > * X'*W^(1/2) * W^(1/2)*Y
> >
> > = (U'*U)^(-1) * U'*Z
> >
> > where W^(1/2) is symmetric so that X'*W^(1/2)=(W^(1/2)*X)'=U'.
> > If one wanted to really be diligent, then one would check
> > collinearity employing the design matrix U. But the general
> > advice to check collinearity employing the design matrix X
> > (which includes the intercept term) should provide a reasonable
> > evaluuation of collinearity.
> >
> > Dale
> >
> > ---------------------------------------
> > Dale McLerran
> > Fred Hutchinson Cancer Research Center
> > mailto: dmclerra(a)NO_SPAMfhcrc.org
> > Ph: (206) 667-2926
> > Fax: (206) 667-5977
> > ---------------------------------------- Hide quoted
> text -
> >
> > - Show quoted text -
>
> Dale,
>
> I'm not sure if you'll see this post, but if you do and have the time,
> I'd be very interested in your thoughts on to two loosely related
> questions to your post.
>
> (1) I recently computed beta hats for a random intercept model using
> the formula you presented...
>
> beta hat = ((X'*(V)^(-1)*X)^(-1)) * X'*(V)^(-1)*Y
>
> However since X was overparameterized, I took the generalized inverse
> instead of taking the true inverse. Although I obtained correct
> results, I would like to confirm that using the generalized inverse
> function is valid.
>
> (2) I noticed the beta coefficients were exactly the same as those
> obtained from the fixed effects model. This led me to wonder if this
> will always turn out to be the case in the special circumstance where
> the only random term is a random intercept and the design is fully
> balanced with no missing data.
>
> Thanks,
>
> Ryan
>

Ryan,

With respect to your first question, yes, the generalized
inverse is appropriate when X'X is not full rank. I haven't
investigated the properties of the fully balanced design,
but I believe that it is probably true that in a fully
balanced design, the point estimates for the random effect
model would be the same as the point estimates for the fixed
effect model. The code below certainly bears this out
empirically.

In this code, I generate data for 10 experiments. Each experiment
has 20 clusters, the first 10 of which are assigned to arm=0 and
the second 10 of which are assigned to arm=1. Thus, the data are
fully balanced with respect to treatment arm. I also include a
covariate X.

Subsequently, the MIXED procedure is invoked twice, once
specifying only a fixed effect model with intercept and arm.
The second invocation of the MIXED procedure adds in the
random intercept effect with clusters nested within arm. Note
that for both invocations of the MIXED procedure, the covariate
is excluded from the model. Parameter estimates for each run
are saved to an output data set and then the COMPARE procedure
is invoked to examine whether the point estimates are the same
for the fixed and random effect models. We observe that the
point estimates are (for all practical purposes, and likely
theoretically) identical.

It is worth observing that the point estimates are identical
even though we estimated the wrong model since we did not
include the covariate X. When X is added to the model, the
design is no longer fully balanced. This is seen in a second
round of invocations of the MIXED procedure followed by a
PROC COMPARE.

/* Data generation */
data test;
v_clust=4;
v_resid=1;
do rep=1 to 10;
do cluster=1 to 20;
gamma = sqrt(v_clust)*rannor(1234579);
do i=1 to 20;
arm = (i>10);
x = rannor(1234579);
y = 3 + 2*arm + 4*x + gamma + rannor(1234579);
output;
end;
end;
end;
keep rep cluster arm x y;
run;

title "Balanced design with fixed effects only";
ods output SolutionF=ParmsFixed;
proc mixed data=test;
by rep;
class arm;
model y = arm / s;
run;

title "Balanced design with random intercept effects";
ods output SolutionF=ParmsRandom;
proc mixed data=test;
by rep;
class cluster arm;
model y = arm / s;
random intercept / subject=cluster(arm);
run;

title "Compare parameters for the balanced design models";
proc compare base=ParmsFixed(keep=estimate)
compare=ParmsRandom(keep=estimate);
run;

title "Covariate added to fixed effect model";
ods output SolutionF=ParmsFixed;
proc mixed data=test;
by rep;
class arm;
model y = arm x / s;
run;

title "Covariate added to random effect model";
ods output SolutionF=ParmsRandom;
proc mixed data=test;
by rep;
class cluster arm;
model y = arm x / s;
random intercept / subject=cluster(arm);
run;

title "Compare parameters estimated with covariate in model";
proc compare base=ParmsFixed(keep=estimate)
compare=ParmsRandom(keep=estimate);
run;

This is only empirical confirmation of what you have observed.
However, it is not unreasonable to believe that for a fully
balanced design, the random effects don't influence the
parameters which enter as fixed effects. And given that we
observe this empirically for 10 trials, it would seem a pretty
safe bet that we should observe this in the future.

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra(a)NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

From: Ryan on 22 Jan 2010 17:59

On Jan 22, 2:23 pm, stringplaye...(a)YAHOO.COM (Dale McLerran) wrote:
> --- On Thu, 1/21/10, Ryan <ryan.andrew.bl...(a)GMAIL.COM> wrote:
>
>
>
>
>
> > From: Ryan <ryan.andrew.bl...(a)GMAIL.COM>
> > Subject: Re: proc mixed: multicolinearity
> > To: SA...(a)LISTSERV.UGA.EDU
> > Date: Thursday, January 21, 2010, 6:17 PM
> > On Jan 19, 2:32 pm, stringplaye...(a)YAHOO.COM
> > (Dale McLerran) wrote:
> > > --- On Sun, 1/17/10, Peter Flom <peterflomconsult...(a)MINDSPRING.COM>
> > wrote:
>
> > > I suppose that one could construct an example where collinearity
> > > statistics should be examined at the subject level. In general,
> > > I would not worry about assessing collinearity within subjects.
>
> > > Mixed model parameter estimates for the fixed effects are obtained
> > > by minimizing
>
> > > (Y - X*beta)' * V^(-1) * (Y - X*beta)
>
> > > where X is the covariate matrix for all subjects. Thus, we
> > > are primarily concerned with collinearity that is manifest
> > > across all subjects. The variance term in the middle is
> > > obtained as
>
> > > V = Z*G*Z' + R
>
> > > where Z is the within-subjects design matrix. Now, the
> > > efficiency of our estimate of beta depends on having a good
> > > representation of V. To the extent that collinearity does
> > > not affect the estimate of a predicted value but whether
> > > we are able to correctly state the effect of each variable on
> > > that estimate, we don't really care whether V is attributable
> > > to column Z1, Z2, ..., Zk. As long as we get the estimate of
> > > the between subject variance correct, then collinearity becomes
> > > an issue only for the complete design matrix X.
>
> > > I would note that the solution for beta hat in mixed models
> > > is actually no different than the solution for beta hat in a
> > > logistic regression model or a Poisson regression model or
> > > any of the other generalized linear models solutions. In all
> > > of these, we obtain beta hat as
>
> > > beta hat = (X'*W*X)^(-1) * X'*W*Y
>
> > > where W=V^(-1) in the mixed models analysis.
>
> > > So, to the extent that one can use PROC REG to assess collinearity
> > > for a generalized linear model, then one could use PROC REG to
> > > assess collinearity in a mixed model setting. However, the
> > > above solution can be rewritten as
>
> > > beta hat = (X'*W^(1/2) * W^(1/2)*X)^(-1)
> > > * X'*W^(1/2) * W^(1/2)*Y
>
> > > = (U'*U)^(-1) * U'*Z
>
> > > where W^(1/2) is symmetric so that X'*W^(1/2)=(W^(1/2)*X)'=U'.
> > > If one wanted to really be diligent, then one would check
> > > collinearity employing the design matrix U. But the general
> > > advice to check collinearity employing the design matrix X
> > > (which includes the intercept term) should provide a reasonable
> > > evaluuation of collinearity.
>
> > > Dale
>
> > > ---------------------------------------
> > > Dale McLerran
> > > Fred Hutchinson Cancer Research Center
> > > mailto: dmclerra(a)NO_SPAMfhcrc.org
> > > Ph: (206) 667-2926
> > > Fax: (206) 667-5977
> > > ---------------------------------------- Hide quoted
> > text -
>
> > > - Show quoted text -
>
> > Dale,
>
> > I'm not sure if you'll see this post, but if you do and have the time,
> > I'd be very interested in your thoughts on to two loosely related
> > questions to your post.
>
> > (1) I recently computed beta hats for a random intercept model using
> > the formula you presented...
>
> > beta hat = ((X'*(V)^(-1)*X)^(-1)) * X'*(V)^(-1)*Y
>
> > However since X was overparameterized, I took the generalized inverse
> > instead of taking the true inverse. Although I obtained correct
> > results, I would like to confirm that using the generalized inverse
> > function is valid.
>
> > (2) I noticed the beta coefficients were exactly the same as those
> > obtained from the fixed effects model. This led me to wonder if this
> > will always turn out to be the case in the special circumstance where
> > the only random term is a random intercept and the design is fully
> > balanced with no missing data.
>
> > Thanks,
>
> > Ryan
>
> Ryan,
>
> With respect to your first question, yes, the generalized
> inverse is appropriate when X'X is not full rank. I haven't
> investigated the properties of the fully balanced design,
> but I believe that it is probably true that in a fully
> balanced design, the point estimates for the random effect
> model would be the same as the point estimates for the fixed
> effect model. The code below certainly bears this out
> empirically.
>
> In this code, I generate data for 10 experiments. Each experiment
> has 20 clusters, the first 10 of which are assigned to arm=0 and
> the second 10 of which are assigned to arm=1. Thus, the data are
> fully balanced with respect to treatment arm. I also include a
> covariate X.
>
> Subsequently, the MIXED procedure is invoked twice, once
> specifying only a fixed effect model with intercept and arm.
> The second invocation of the MIXED procedure adds in the
> random intercept effect with clusters nested within arm. Note
> that for both invocations of the MIXED procedure, the covariate
> is excluded from the model. Parameter estimates for each run
> are saved to an output data set and then the COMPARE procedure
> is invoked to examine whether the point estimates are the same
> for the fixed and random effect models. We observe that the
> point estimates are (for all practical purposes, and likely
> theoretically) identical.
>
> It is worth observing that the point estimates are identical
> even though we estimated the wrong model since we did not
> include the covariate X. When X is added to the model, the
> design is no longer fully balanced. This is seen in a second
> round of invocations of the MIXED procedure followed by a
> PROC COMPARE.
>
> /* Data generation */
> data test;
> v_clust=4;
> v_resid=1;
> do rep=1 to 10;
> do cluster=1 to 20;
> gamma = sqrt(v_clust)*rannor(1234579);
> do i=1 to 20;
> arm = (i>10);
> x = rannor(1234579);
> y = 3 + 2*arm + 4*x + gamma + rannor(1234579);
> output;
> end;
> end;
> end;
> keep rep cluster arm x y;
> run;
>
> title "Balanced design with fixed effects only";
> ods output SolutionF=ParmsFixed;
> proc mixed data=test;
> by rep;
> class arm;
> model y = arm / s;
> run;
>
> title "Balanced design with random intercept effects";
> ods output SolutionF=ParmsRandom;
> proc mixed data=test;
> by rep;
> class cluster arm;
> model y = arm / s;
> random intercept / subject=cluster(arm);
> run;
>
> title "Compare parameters for the balanced design models";
> proc compare base=ParmsFixed(keep=estimate)
> compare=ParmsRandom(keep=estimate);
> run;
>
> title "Covariate added to fixed effect model";
> ods output SolutionF=ParmsFixed;
> proc mixed data=test;
> by rep;
> class arm;
> model y = arm x / s;
> run;
>
> title "Covariate added to random effect model";
> ods output SolutionF=ParmsRandom;
> proc mixed data=test;
> by rep;
> class cluster arm;
> model y = arm x / s;
> random intercept / subject=cluster(arm);
> run;
>
> title "Compare parameters estimated with covariate in model";
> proc compare base=ParmsFixed(keep=estimate)
> compare=ParmsRandom(keep=estimate);
> run;
>
> This is only empirical confirmation of what you have observed.
> However, it is not unreasonable to believe that for a fully
> balanced design, the random effects don't influence the
> parameters which enter as fixed effects. And given that we
> observe this empirically for 10 trials, it would seem a pretty
> safe bet that we should observe this in the future.
>
> Dale
>
> ---------------------------------------
> Dale McLerran
> Fred Hutchinson Cancer Research Center
> mailto: dmclerra(a)NO_SPAMfhcrc.org
> Ph: (206) 667-2926
> Fax: (206) 667-5977
> ---------------------------------------- Hide quoted text -
>
> - Show quoted text -

Dale,

Thank you for responding to my initial question and empirically
testing the second question. I ran the code you presented and needless
to say I continue to observe what I'd expect.

Hope you have an enjoyable weekend.

Ryan

|
Pages: 1
Prev: DO While in Macro
Next: Which process lock a table on Win environment