Prev: Proc Append
Next: Help in reading a flat file.
From: Dale McLerran on 10 Dec 2009 23:00 --- On Thu, 12/10/09, Jeremy Miles <jeremy.miles(a)GMAIL.COM> wrote: > From: Jeremy Miles <jeremy.miles(a)GMAIL.COM> > Subject: Predicted probabilities in GLIMMIX > To: SAS-L(a)LISTSERV.UGA.EDU > Date: Thursday, December 10, 2009, 10:43 AM > Hello everyone, > > I'm calculating predicted probabilities for a logistic regression > model in GLIMMIX, and there's something I don't understand (and can't > work out). My predicted probabilities (based either on my calculation > from the parameter estimates, or on saved predicted probabilities from > glimmix) are all much too high (or too low). > > Here's a simple example to show what I mean. There are two variables > - an id variable, and an outcome, which is dichotomous, 50% of scores > are 0, 50% are 1. > > data test; > input > id outcome ; > cards; > 1 0 > 1 0 > 1 0 > 1 0 > 1 0 > 1 1 > 1 0 > 1 0 > 1 0 > 1 1 > 2 0 > 2 0 > 2 1 > 2 1 > 3 0 > 3 1 > 3 1 > 3 1 > 4 1 > 4 1 > 4 0 > 4 0 > 5 0 > 5 0 > 5 0 > 5 0 > 6 1 > 6 1 > 6 1 > 6 1 > 6 1 > 6 1 > 6 1 > 6 1 > ; > RUN; > > > > If I run an intercept only model with either proc logistic or proc > glimmix, with no random effects, I get wha I expect, a parameter > estimate of zero: > > proc logistic data=test; > model outcome = /; > run; > > Analysis of Maximum Likelihood Estimates > > Standard Wald > Parameter DF Estimate Error Chi-Square Pr > ChiSq > Intercept 1 0 0.3430 0.0000 1.0000 > > > proc glimmix data=test method=quad; > model outcome = / dist=binary solution; > run; > > > Parameter Estimates > > Standard > Effect Estimate Error DF t Value Pr > |t| > Intercept -205E-19 0.3430 33 -0.00 1.0000 > > > proc glimmix data=test method=quad; > class id; > model outcome = /dist=binary solution ; > random intercept / subject=id; > run; > > > When I run it with the random effect, the parameter estimate isn't > zero any more. > > > Solutions for Fixed Effects > > Standard > Effect Estimate Error DF t Value Pr > |t| > Intercept -0.03590 0.8643 5 -0.04 0.9685 > > > > Can anyone explain what I'm missing here? > > Thanks, > > Jeremy > > > -- > Jeremy Miles > Psychology Research Methods Wiki: > www.researchmethodsinpsychology.com > Jeremy, Consider the subject-specific intercept estimates. For your six subjects, we have Success N p logit(p) ------- -- ---- --------- 2 10 0.2 -0.60206 2 4 0.5 0.0 3 4 0.75 0.47712 2 4 0.5 0.0 0 4 0.0 -inf 8 8 1.0 inf Averaging the logit(p) values across the four subjects for which logit(p) is not infinity, we obtain -0.03123 - pretty close to the value -0.03590 which is obtained for the random effects model. You'll note that logit(p) in an intercept only model is the intercept parameter. Averaging the intercepts across subjects is similar to the random effects model. The two are not identical. Also, I have done nothing to take into account the amount of information obtained for each subject. The infinities cause some difficulties for averaging the logits while accounting for the amount of information. But I trust that you observe that the simple averaging that you assumed initially is not the same as the random effect model. Dale --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra(a)NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 ---------------------------------------
From: Jeremy Miles on 11 Dec 2009 16:50 Thanks Dale, that makes sense. In my real example, the predicted probabilities weren't just a little off, they were way, way off - but now I understand why - my outcome was smoking at different ages for individuals, and my predicted probabilities were around 99% for around half the people/time combinations - but now I understand the problem is the never smokers fell out of the model. Time for a rethink .... Thanks again, Jeremy 2009/12/10 Dale McLerran <stringplayer_2(a)yahoo.com>: > --- On Thu, 12/10/09, Jeremy Miles <jeremy.miles(a)GMAIL.COM> wrote: > >> From: Jeremy Miles <jeremy.miles(a)GMAIL.COM> >> Subject: Predicted probabilities in GLIMMIX >> To: SAS-L(a)LISTSERV.UGA.EDU >> Date: Thursday, December 10, 2009, 10:43 AM >> Hello everyone, >> >> I'm calculating predicted probabilities for a logistic regression >> model in GLIMMIX, and there's something I don't understand (and can't >> work out). My predicted probabilities (based either on my calculation >> from the parameter estimates, or on saved predicted probabilities from >> glimmix) are all much too high (or too low). >> >> Here's a simple example to show what I mean. There are two variables >> - an id variable, and an outcome, which is dichotomous, 50% of scores >> are 0, 50% are 1. >> >> data test; >> input >> id outcome ; >> cards; >> 1 0 >> 1 0 >> 1 0 >> 1 0 >> 1 0 >> 1 1 >> 1 0 >> 1 0 >> 1 0 >> 1 1 >> 2 0 >> 2 0 >> 2 1 >> 2 1 >> 3 0 >> 3 1 >> 3 1 >> 3 1 >> 4 1 >> 4 1 >> 4 0 >> 4 0 >> 5 0 >> 5 0 >> 5 0 >> 5 0 >> 6 1 >> 6 1 >> 6 1 >> 6 1 >> 6 1 >> 6 1 >> 6 1 >> 6 1 >> ; >> RUN; >> >> >> >> If I run an intercept only model with either proc logistic or proc >> glimmix, with no random effects, I get wha I expect, a parameter >> estimate of zero: >> >> proc logistic data=test; >> model outcome = /; >> run; >> >> Analysis of Maximum Likelihood Estimates >> >> Standard Wald >> Parameter DF Estimate Error Chi-Square Pr > ChiSq >> Intercept 1 0 0.3430 0.0000 1.0000 >> >> >> proc glimmix data=test method=quad; >> model outcome = / dist=binary solution; >> run; >> >> >> Parameter Estimates >> >> Standard >> Effect Estimate Error DF t Value Pr > |t| >> Intercept -205E-19 0.3430 33 -0.00 1.0000 >> >> >> proc glimmix data=test method=quad; >> class id; >> model outcome = /dist=binary solution ; >> random intercept / subject=id; >> run; >> >> >> When I run it with the random effect, the parameter estimate isn't >> zero any more. >> >> >> Solutions for Fixed Effects >> >> Standard >> Effect Estimate Error DF t Value Pr > |t| >> Intercept -0.03590 0.8643 5 -0.04 0.9685 >> >> >> >> Can anyone explain what I'm missing here? >> >> Thanks, >> >> Jeremy >> >> >> -- >> Jeremy Miles >> Psychology Research Methods Wiki: >> www.researchmethodsinpsychology.com >> > > Jeremy, > > Consider the subject-specific intercept estimates. For your > six subjects, we have > > Success N p logit(p) > ------- -- ---- --------- > 2 10 0.2 -0.60206 > 2 4 0.5 0.0 > 3 4 0.75 0.47712 > 2 4 0.5 0.0 > 0 4 0.0 -inf > 8 8 1.0 inf > > > Averaging the logit(p) values across the four subjects for > which logit(p) is not infinity, we obtain -0.03123 - pretty > close to the value -0.03590 which is obtained for the random > effects model. > > You'll note that logit(p) in an intercept only model is the > intercept parameter. Averaging the intercepts across > subjects is similar to the random effects model. The two > are not identical. Also, I have done nothing to take into > account the amount of information obtained for each subject. > The infinities cause some difficulties for averaging the > logits while accounting for the amount of information. > > But I trust that you observe that the simple averaging that > you assumed initially is not the same as the random effect > model. > > Dale > > --------------------------------------- > Dale McLerran > Fred Hutchinson Cancer Research Center > mailto: dmclerra(a)NO_SPAMfhcrc.org > Ph: (206) 667-2926 > Fax: (206) 667-5977 > --------------------------------------- > > > > -- Jeremy Miles Psychology Research Methods Wiki: www.researchmethodsinpsychology.com
From: Dale McLerran on 11 Dec 2009 19:16 Jeremy, No, I don't think that it is that never smokers fall out of the analysis. Rather, it is that you are operating on the logit scale and averages of the logits is not the same as averages of the probabilities that give rise to the logits. Since you don't actually fit person-specific estimates, the never smokers do stay in the analysis. If you ran your GLIMMIX code without the never smokers, you would obtain much different estimates than are observed with the never smokers in the analysis. However, their contributions to the logit are a little difficult to present when doing the simple logit averaging which I presented yesterday. Dale --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra(a)NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 --------------------------------------- --- On Fri, 12/11/09, Jeremy Miles <jeremy.miles(a)gmail.com> wrote: > From: Jeremy Miles <jeremy.miles(a)gmail.com> > Subject: Re: Predicted probabilities in GLIMMIX > To: "Dale McLerran" <stringplayer_2(a)yahoo.com> > Cc: SAS-L(a)listserv.uga.edu > Date: Friday, December 11, 2009, 1:50 PM > Thanks Dale, that makes sense. > > In my real example, the predicted probabilities weren't just a little > off, they were way, way off - but now I understand why - my outcome > was smoking at different ages for individuals, and my predicted > probabilities were around 99% for around half the people/time > combinations - but now I understand the problem is the never smokers > fell out of the model. > > Time for a rethink .... > > Thanks again, > > Jeremy > > > > 2009/12/10 Dale McLerran <stringplayer_2(a)yahoo.com>: > > --- On Thu, 12/10/09, Jeremy Miles <jeremy.miles(a)GMAIL.COM> > wrote: > > > >> From: Jeremy Miles <jeremy.miles(a)GMAIL.COM> > >> Subject: Predicted probabilities in GLIMMIX > >> To: SAS-L(a)LISTSERV.UGA.EDU > >> Date: Thursday, December 10, 2009, 10:43 AM > >> Hello everyone, > >> > >> I'm calculating predicted probabilities for a > logistic regression > >> model in GLIMMIX, and there's something I don't > understand (and can't > >> work out). My predicted probabilities (based > either on my calculation > >> from the parameter estimates, or on saved > predicted probabilities from > >> glimmix) are all much too high (or too low). > >> > >> Here's a simple example to show what I mean. > There are two variables > >> - an id variable, and an outcome, which is > dichotomous, 50% of scores > >> are 0, 50% are 1. > >> > >> data test; > >> input > >> id outcome ; > >> cards; > >> 1 0 > >> 1 0 > >> 1 0 > >> 1 0 > >> 1 0 > >> 1 1 > >> 1 0 > >> 1 0 > >> 1 0 > >> 1 1 > >> 2 0 > >> 2 0 > >> 2 1 > >> 2 1 > >> 3 0 > >> 3 1 > >> 3 1 > >> 3 1 > >> 4 1 > >> 4 1 > >> 4 0 > >> 4 0 > >> 5 0 > >> 5 0 > >> 5 0 > >> 5 0 > >> 6 1 > >> 6 1 > >> 6 1 > >> 6 1 > >> 6 1 > >> 6 1 > >> 6 1 > >> 6 1 > >> ; > >> RUN; > >> > >> > >> > >> If I run an intercept only model with either proc > logistic or proc > >> glimmix, with no random effects, I get wha I > expect, a parameter > >> estimate of zero: > >> > >> proc logistic data=test; > >> model > outcome = /; > >> run; > >> > >> > > Analysis of Maximum Likelihood Estimates > >> > >> > > Standard > Wald > >> Parameter DF > Estimate Error > Chi-Square Pr > ChiSq > >> Intercept 1 > 0 > 0.3430 0.0000 > 1.0000 > >> > >> > >> proc glimmix data=test method=quad; > >> model > outcome = / dist=binary solution; > >> run; > >> > >> > >> > Parameter > Estimates > >> > >> > Standard > >> Effect > Estimate > Error > DF t > Value Pr > |t| > >> Intercept > -205E-19 0.3430 > 33 -0.00 > 1.0000 > >> > >> > >> proc glimmix data=test method=quad; > >> class id; > >> model > outcome = /dist=binary solution ; > >> random > intercept / subject=id; > >> run; > >> > >> > >> When I run it with the random effect, the > parameter estimate isn't > >> zero any more. > >> > >> > >> > Solutions for Fixed > Effects > >> > >> > Standard > >> Effect > Estimate > Error > DF t > Value Pr > |t| > >> Intercept -0.03590 > 0.8643 5 > -0.04 0.9685 > >> > >> > >> > >> Can anyone explain what I'm missing here? > >> > >> Thanks, > >> > >> Jeremy > >> > >> > >> -- > >> Jeremy Miles > >> Psychology Research Methods Wiki: > >> www.researchmethodsinpsychology.com > >> > > > > Jeremy, > > > > Consider the subject-specific intercept > estimates. For your > > six subjects, we have > > > > Success > N > p > logit(p) > > ------- -- > ---- > --------- > > 2 > 10 0.2 > -0.60206 > > 2 > 4 0.5 > 0.0 > > 3 > 4 0.75 > 0.47712 > > 2 > 4 0.5 > 0.0 > > 0 > 4 0.0 > -inf > > 8 > 8 1.0 > inf > > > > > > Averaging the logit(p) values across the four subjects > for > > which logit(p) is not infinity, we obtain -0.03123 - > pretty > > close to the value -0.03590 which is obtained for the > random > > effects model. > > > > You'll note that logit(p) in an intercept only model > is the > > intercept parameter. Averaging the intercepts > across > > subjects is similar to the random effects model. > The two > > are not identical. Also, I have done nothing to > take into > > account the amount of information obtained for each > subject. > > The infinities cause some difficulties for averaging > the > > logits while accounting for the amount of > information. > > > > But I trust that you observe that the simple averaging > that > > you assumed initially is not the same as the random > effect > > model. > > > > Dale > > > > --------------------------------------- > > Dale McLerran > > Fred Hutchinson Cancer Research Center > > mailto: dmclerra(a)NO_SPAMfhcrc.org > > Ph: (206) 667-2926 > > Fax: (206) 667-5977 > > --------------------------------------- > > > > > > > > > > > > -- > Jeremy Miles > Psychology Research Methods Wiki: > www.researchmethodsinpsychology.com >
|
Pages: 1 Prev: Proc Append Next: Help in reading a flat file. |