Predicted probabilities in GLIMMIX [SAS]

Prev: Proc Append
Next: Help in reading a flat file.

From: Dale McLerran on 10 Dec 2009 23:00

--- On Thu, 12/10/09, Jeremy Miles <jeremy.miles(a)GMAIL.COM> wrote:

> From: Jeremy Miles <jeremy.miles(a)GMAIL.COM>
> Subject: Predicted probabilities in GLIMMIX
> To: SAS-L(a)LISTSERV.UGA.EDU
> Date: Thursday, December 10, 2009, 10:43 AM
> Hello everyone,
>
> I'm calculating predicted probabilities for a logistic regression
> model in GLIMMIX, and there's something I don't understand (and can't
> work out). My predicted probabilities (based either on my calculation
> from the parameter estimates, or on saved predicted probabilities from
> glimmix) are all much too high (or too low).
>
> Here's a simple example to show what I mean. There are two variables
> - an id variable, and an outcome, which is dichotomous, 50% of scores
> are 0, 50% are 1.
>
> data test;
> input
> id outcome ;
> cards;
> 1 0
> 1 0
> 1 0
> 1 0
> 1 0
> 1 1
> 1 0
> 1 0
> 1 0
> 1 1
> 2 0
> 2 0
> 2 1
> 2 1
> 3 0
> 3 1
> 3 1
> 3 1
> 4 1
> 4 1
> 4 0
> 4 0
> 5 0
> 5 0
> 5 0
> 5 0
> 6 1
> 6 1
> 6 1
> 6 1
> 6 1
> 6 1
> 6 1
> 6 1
> ;
> RUN;
>
>
>
> If I run an intercept only model with either proc logistic or proc
> glimmix, with no random effects, I get wha I expect, a parameter
> estimate of zero:
>
> proc logistic data=test;
> model outcome = /;
> run;
>
> Analysis of Maximum Likelihood Estimates
>
> Standard Wald
> Parameter DF Estimate Error Chi-Square Pr > ChiSq
> Intercept 1 0 0.3430 0.0000 1.0000
>
>
> proc glimmix data=test method=quad;
> model outcome = / dist=binary solution;
> run;
>
>
> Parameter Estimates
>
> Standard
> Effect Estimate Error DF t Value Pr > |t|
> Intercept -205E-19 0.3430 33 -0.00 1.0000
>
>
> proc glimmix data=test method=quad;
> class id;
> model outcome = /dist=binary solution ;
> random intercept / subject=id;
> run;
>
>
> When I run it with the random effect, the parameter estimate isn't
> zero any more.
>
>
> Solutions for Fixed Effects
>
> Standard
> Effect Estimate Error DF t Value Pr > |t|
> Intercept -0.03590 0.8643 5 -0.04 0.9685
>
>
>
> Can anyone explain what I'm missing here?
>
> Thanks,
>
> Jeremy
>
>
> --
> Jeremy Miles
> Psychology Research Methods Wiki:
> www.researchmethodsinpsychology.com
>

Jeremy,

Consider the subject-specific intercept estimates. For your
six subjects, we have

Success N p logit(p)
------- -- ---- ---------
2 10 0.2 -0.60206
2 4 0.5 0.0
3 4 0.75 0.47712
2 4 0.5 0.0
0 4 0.0 -inf
8 8 1.0 inf

Averaging the logit(p) values across the four subjects for
which logit(p) is not infinity, we obtain -0.03123 - pretty
close to the value -0.03590 which is obtained for the random
effects model.

You'll note that logit(p) in an intercept only model is the
intercept parameter. Averaging the intercepts across
subjects is similar to the random effects model. The two
are not identical. Also, I have done nothing to take into
account the amount of information obtained for each subject.
The infinities cause some difficulties for averaging the
logits while accounting for the amount of information.

But I trust that you observe that the simple averaging that
you assumed initially is not the same as the random effect
model.

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra(a)NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

From: Jeremy Miles on 11 Dec 2009 16:50

Thanks Dale, that makes sense.

In my real example, the predicted probabilities weren't just a little
off, they were way, way off - but now I understand why - my outcome
was smoking at different ages for individuals, and my predicted
probabilities were around 99% for around half the people/time
combinations - but now I understand the problem is the never smokers
fell out of the model.

Time for a rethink ....

Thanks again,

Jeremy

2009/12/10 Dale McLerran <stringplayer_2(a)yahoo.com>:
> --- On Thu, 12/10/09, Jeremy Miles <jeremy.miles(a)GMAIL.COM> wrote:
>
>> From: Jeremy Miles <jeremy.miles(a)GMAIL.COM>
>> Subject: Predicted probabilities in GLIMMIX
>> To: SAS-L(a)LISTSERV.UGA.EDU
>> Date: Thursday, December 10, 2009, 10:43 AM
>> Hello everyone,
>>
>> I'm calculating predicted probabilities for a logistic regression
>> model in GLIMMIX, and there's something I don't understand (and can't
>> work out). My predicted probabilities (based either on my calculation
>> from the parameter estimates, or on saved predicted probabilities from
>> glimmix) are all much too high (or too low).
>>
>> Here's a simple example to show what I mean. There are two variables
>> - an id variable, and an outcome, which is dichotomous, 50% of scores
>> are 0, 50% are 1.
>>
>> data test;
>> input
>> id outcome ;
>> cards;
>> 1 0
>> 1 0
>> 1 0
>> 1 0
>> 1 0
>> 1 1
>> 1 0
>> 1 0
>> 1 0
>> 1 1
>> 2 0
>> 2 0
>> 2 1
>> 2 1
>> 3 0
>> 3 1
>> 3 1
>> 3 1
>> 4 1
>> 4 1
>> 4 0
>> 4 0
>> 5 0
>> 5 0
>> 5 0
>> 5 0
>> 6 1
>> 6 1
>> 6 1
>> 6 1
>> 6 1
>> 6 1
>> 6 1
>> 6 1
>> ;
>> RUN;
>>
>>
>>
>> If I run an intercept only model with either proc logistic or proc
>> glimmix, with no random effects, I get wha I expect, a parameter
>> estimate of zero:
>>
>> proc logistic data=test;
>> model outcome = /;
>> run;
>>
>> Analysis of Maximum Likelihood Estimates
>>
>> Standard Wald
>> Parameter DF Estimate Error Chi-Square Pr > ChiSq
>> Intercept 1 0 0.3430 0.0000 1.0000
>>
>>
>> proc glimmix data=test method=quad;
>> model outcome = / dist=binary solution;
>> run;
>>
>>
>> Parameter Estimates
>>
>> Standard
>> Effect Estimate Error DF t Value Pr > |t|
>> Intercept -205E-19 0.3430 33 -0.00 1.0000
>>
>>
>> proc glimmix data=test method=quad;
>> class id;
>> model outcome = /dist=binary solution ;
>> random intercept / subject=id;
>> run;
>>
>>
>> When I run it with the random effect, the parameter estimate isn't
>> zero any more.
>>
>>
>> Solutions for Fixed Effects
>>
>> Standard
>> Effect Estimate Error DF t Value Pr > |t|
>> Intercept -0.03590 0.8643 5 -0.04 0.9685
>>
>>
>>
>> Can anyone explain what I'm missing here?
>>
>> Thanks,
>>
>> Jeremy
>>
>>
>> --
>> Jeremy Miles
>> Psychology Research Methods Wiki:
>> www.researchmethodsinpsychology.com
>>
>
> Jeremy,
>
> Consider the subject-specific intercept estimates. For your
> six subjects, we have
>
> Success N p logit(p)
> ------- -- ---- ---------
> 2 10 0.2 -0.60206
> 2 4 0.5 0.0
> 3 4 0.75 0.47712
> 2 4 0.5 0.0
> 0 4 0.0 -inf
> 8 8 1.0 inf
>
>
> Averaging the logit(p) values across the four subjects for
> which logit(p) is not infinity, we obtain -0.03123 - pretty
> close to the value -0.03590 which is obtained for the random
> effects model.
>
> You'll note that logit(p) in an intercept only model is the
> intercept parameter. Averaging the intercepts across
> subjects is similar to the random effects model. The two
> are not identical. Also, I have done nothing to take into
> account the amount of information obtained for each subject.
> The infinities cause some difficulties for averaging the
> logits while accounting for the amount of information.
>
> But I trust that you observe that the simple averaging that
> you assumed initially is not the same as the random effect
> model.
>
> Dale
>
> ---------------------------------------
> Dale McLerran
> Fred Hutchinson Cancer Research Center
> mailto: dmclerra(a)NO_SPAMfhcrc.org
> Ph: (206) 667-2926
> Fax: (206) 667-5977
> ---------------------------------------
>
>
>
>

--
Jeremy Miles
Psychology Research Methods Wiki: www.researchmethodsinpsychology.com

From: Dale McLerran on 11 Dec 2009 19:16

Jeremy,

No, I don't think that it is that never smokers fall out of
the analysis. Rather, it is that you are operating on the
logit scale and averages of the logits is not the same as
averages of the probabilities that give rise to the logits.

Since you don't actually fit person-specific estimates, the
never smokers do stay in the analysis. If you ran your
GLIMMIX code without the never smokers, you would obtain
much different estimates than are observed with the never
smokers in the analysis. However, their contributions to
the logit are a little difficult to present when doing the
simple logit averaging which I presented yesterday.

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra(a)NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

--- On Fri, 12/11/09, Jeremy Miles <jeremy.miles(a)gmail.com> wrote:

> From: Jeremy Miles <jeremy.miles(a)gmail.com>
> Subject: Re: Predicted probabilities in GLIMMIX
> To: "Dale McLerran" <stringplayer_2(a)yahoo.com>
> Cc: SAS-L(a)listserv.uga.edu
> Date: Friday, December 11, 2009, 1:50 PM
> Thanks Dale, that makes sense.
>
> In my real example, the predicted probabilities weren't just a little
> off, they were way, way off - but now I understand why - my outcome
> was smoking at different ages for individuals, and my predicted
> probabilities were around 99% for around half the people/time
> combinations - but now I understand the problem is the never smokers
> fell out of the model.
>
> Time for a rethink ....
>
> Thanks again,
>
> Jeremy
>
>
>
> 2009/12/10 Dale McLerran <stringplayer_2(a)yahoo.com>:
> > --- On Thu, 12/10/09, Jeremy Miles <jeremy.miles(a)GMAIL.COM>
> wrote:
> >
> >> From: Jeremy Miles <jeremy.miles(a)GMAIL.COM>
> >> Subject: Predicted probabilities in GLIMMIX
> >> To: SAS-L(a)LISTSERV.UGA.EDU
> >> Date: Thursday, December 10, 2009, 10:43 AM
> >> Hello everyone,
> >>
> >> I'm calculating predicted probabilities for a
> logistic regression
> >> model in GLIMMIX, and there's something I don't
> understand (and can't
> >> work out). My predicted probabilities (based
> either on my calculation
> >> from the parameter estimates, or on saved
> predicted probabilities from
> >> glimmix) are all much too high (or too low).
> >>
> >> Here's a simple example to show what I mean.
> There are two variables
> >> - an id variable, and an outcome, which is
> dichotomous, 50% of scores
> >> are 0, 50% are 1.
> >>
> >> data test;
> >> input
> >> id outcome ;
> >> cards;
> >> 1 0
> >> 1 0
> >> 1 0
> >> 1 0
> >> 1 0
> >> 1 1
> >> 1 0
> >> 1 0
> >> 1 0
> >> 1 1
> >> 2 0
> >> 2 0
> >> 2 1
> >> 2 1
> >> 3 0
> >> 3 1
> >> 3 1
> >> 3 1
> >> 4 1
> >> 4 1
> >> 4 0
> >> 4 0
> >> 5 0
> >> 5 0
> >> 5 0
> >> 5 0
> >> 6 1
> >> 6 1
> >> 6 1
> >> 6 1
> >> 6 1
> >> 6 1
> >> 6 1
> >> 6 1
> >> ;
> >> RUN;
> >>
> >>
> >>
> >> If I run an intercept only model with either proc
> logistic or proc
> >> glimmix, with no random effects, I get wha I
> expect, a parameter
> >> estimate of zero:
> >>
> >> proc logistic data=test;
> >> model
> outcome = /;
> >> run;
> >>
> >>
>
> Analysis of Maximum Likelihood Estimates
> >>
> >>
>
> Standard
> Wald
> >> Parameter DF
> Estimate Error
> Chi-Square Pr > ChiSq
> >> Intercept 1
> 0
> 0.3430 0.0000
> 1.0000
> >>
> >>
> >> proc glimmix data=test method=quad;
> >> model
> outcome = / dist=binary solution;
> >> run;
> >>
> >>
> >>
> Parameter
> Estimates
> >>
> >>
> Standard
> >> Effect
> Estimate
> Error
> DF t
> Value Pr > |t|
> >> Intercept
> -205E-19 0.3430
> 33 -0.00
> 1.0000
> >>
> >>
> >> proc glimmix data=test method=quad;
> >> class id;
> >> model
> outcome = /dist=binary solution ;
> >> random
> intercept / subject=id;
> >> run;
> >>
> >>
> >> When I run it with the random effect, the
> parameter estimate isn't
> >> zero any more.
> >>
> >>
> >>
> Solutions for Fixed
> Effects
> >>
> >>
> Standard
> >> Effect
> Estimate
> Error
> DF t
> Value Pr > |t|
> >> Intercept -0.03590
> 0.8643 5
> -0.04 0.9685
> >>
> >>
> >>
> >> Can anyone explain what I'm missing here?
> >>
> >> Thanks,
> >>
> >> Jeremy
> >>
> >>
> >> --
> >> Jeremy Miles
> >> Psychology Research Methods Wiki:
> >> www.researchmethodsinpsychology.com
> >>
> >
> > Jeremy,
> >
> > Consider the subject-specific intercept
> estimates. For your
> > six subjects, we have
> >
> > Success
> N
> p
> logit(p)
> > ------- --
> ----
> ---------
> > 2
> 10 0.2
> -0.60206
> > 2
> 4 0.5
> 0.0
> > 3
> 4 0.75
> 0.47712
> > 2
> 4 0.5
> 0.0
> > 0
> 4 0.0
> -inf
> > 8
> 8 1.0
> inf
> >
> >
> > Averaging the logit(p) values across the four subjects
> for
> > which logit(p) is not infinity, we obtain -0.03123 -
> pretty
> > close to the value -0.03590 which is obtained for the
> random
> > effects model.
> >
> > You'll note that logit(p) in an intercept only model
> is the
> > intercept parameter. Averaging the intercepts
> across
> > subjects is similar to the random effects model.
> The two
> > are not identical. Also, I have done nothing to
> take into
> > account the amount of information obtained for each
> subject.
> > The infinities cause some difficulties for averaging
> the
> > logits while accounting for the amount of
> information.
> >
> > But I trust that you observe that the simple averaging
> that
> > you assumed initially is not the same as the random
> effect
> > model.
> >
> > Dale
> >
> > ---------------------------------------
> > Dale McLerran
> > Fred Hutchinson Cancer Research Center
> > mailto: dmclerra(a)NO_SPAMfhcrc.org
> > Ph: (206) 667-2926
> > Fax: (206) 667-5977
> > ---------------------------------------
> >
> >
> >
> >
>
>
>
> --
> Jeremy Miles
> Psychology Research Methods Wiki:
> www.researchmethodsinpsychology.com
>

|
Pages: 1
Prev: Proc Append
Next: Help in reading a flat file.