Prev: logistics model.
Next: data manipulation problem
From: Shawn Haskell on 23 Oct 2009 10:58 On Oct 23, 4:15 am, lele...(a)HOTMAIL.FR (Alexis Lelex) wrote: > Hi, > > I'm working on the probability of an unemployed person to switch to another > "family" profession in a specific interval of years. > So i was thinking of proceeding time model, with use of proc lifetest and > phreg, but the results of the second one will be quite the same of the > logistic i might think, won't it ? if you are interested in the risk or probability that the event will occur, then PHREG could be a way to go with time-to-event data. if you are interested in factors that affect timing of the event then consider LIFEREG for an accelerated failure-time model. If I read your original post correctly, your model with covariates has a much reduced AIC - this indicates your covariates are useful. Not sure how you are getting an r-square value from a likelihood-based model (i.e., not least squares) - didn't think SAS did such a thing. Not all models have to be predictive - really depends on your objectives - "significant" explanatory variables can represent real discovery about a system even though much uneplained variation still exists. In many cases though, predictive power is what folks are trying to achieve - we don't really know what you are trying to achieve.
From: Dale McLerran on 26 Oct 2009 02:02 Alexis, You don't indicate how many observations are in your data set. But my guess is that you have a large number of observations. The value of the AIC statistic increases with your sample size. You really cannot interpret the AIC value directly. What is more important to consider is whether the AIC shows a significant decrease when comparing the model with an intercept only and the model with covariates. Since you are estimating only 19 parameters (including the intercept) and the value of -2LL decreases by approximately 7750, your model is doing a fairly good job of predicting the response. I would further note that the value of R^2 for a binary response can be rather difficult to interpret. However, an R^2 of 0.1158 is really pretty decent. Also, an AUC value of 0.715 indicates a fairly decent model. You are correct, however, that the Hosmer-Lemeshow statistic indicates that at least one of your continuous covariates is not parameterized as well as it could be. You might want to fit your model using the GENMOD procedure and use the ASSESS statement to determine better parameterizations of the continuous predictors. Dale --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra(a)NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 --------------------------------------- --- On Thu, 10/22/09, Alexis Lelex <lelexos(a)HOTMAIL.FR> wrote: > From: Alexis Lelex <lelexos(a)HOTMAIL.FR> > Subject: Quality of logistic regression model > To: SAS-L(a)LISTSERV.UGA.EDU > Date: Thursday, October 22, 2009, 5:52 AM > Hi, > > This is my first post here and my english is not well... so > i'll do my best to make me understand. > I'm modelling a logistic regression on more than 120 000 > individuals, and i get some very interesting results with > my odds ratios, and all p-values are <0,0001. > But some figures in the SAS output make look bad quality of > the model: > high AIC and SC, low R2 and Tau-a, Hosmer and Lemeshow > telling a lack of fit... > Here's some part of my output: > > Statistiques d'ajustement du modèle > > Coordonnées Coordonnées � l'origine > Critères A l'origine Avec Covariables > AIC 91616.461 83858.175 > SC 91626.215 84043.487 > -2 Log L 91614.461 83820.175 > > > R-Square 0.0595 Max-rescaled R-Square 0.1158 > > > Association des probabilités prédites et des réponses observées > > Percent Concordant 71.2 Somers' D 0.431 > Percent Discordant 28.1 Gamma 0.434 > Percent Tied 0.7 Tau-a 0.089 > Pairs 1666460055 c 0.715 > > > Test d'adéquation d'Hosmer et de Lemeshow > > Khi 2 DF Pr > Khi 2 > > 78.2312 8 <.0001 > > > Is it possible to make the interpretation of the odds > ratios, even though > there's a lack of fit and the model isn't predictive ? > In other words what conclusion can we take (or not) from a > model like this > one ? > > If someone can help me on this one it'll be really great ! > > Thanks > > PS: by the way very good SAS forum, i learn a lots of > things reading you > peoples ! >
From: Alexis Lelex on 26 Oct 2009 07:23 thanks for the answer, yes i'll try proc phreg, but before i'm looking to produce the c statistic for it, i find this document which explain pretty well the way to do it "Fitting Cox Model Using PROC PHREG and Beyond in SAS" Lea Liu, Sandy Forman, Bruce Barton Maryland Medical Research Institute, Baltimore, Maryland, USA it's possible to get a r value for proc logistic in sas, but i don't think it's the least squares, even it's might be similar in the interpretation, i don't know i'm quite lost on the subject... my goal it's not to achieve a model with strong predictive power, but i've thought it's not possible to make interpretation of odds ratio's results without good predictivity i hope i was wrong in a certain way Alexis
From: Alexis Lelex on 26 Oct 2009 09:23
you don't indicate how many observations are in your data set. But my guess is that you have a large number of observations. The value of the AIC statistic increases with your sample size. You really cannot interpret the AIC value directly. What is more important to consider is whether the AIC shows a significant decrease when comparing the model with an intercept only and the model with covariates. Since you are estimating only 19 parameters (including the intercept) and the value of -2LL decreases by approximately 7750, your model is doing a fairly good job of predicting the response. I would further note that the value of R^2 for a binary response can be rather difficult to interpret. However, an R^2 of 0.1158 is really pretty decent. Also, an AUC value of 0.715 indicates a fairly decent model. You are correct, however, that the Hosmer-Lemeshow statistic indicates that at least one of your continuous covariates is not parameterized as well as it could be. You might want to fit your model using the GENMOD procedure and use the ASSESS statement to determine better parameterizations of the continuous predictors. Hi Dale, at first thank you for answering. I got a few more than 120 000 observations in my data set. I tried the GENMOD procedure, but it seems that the ASSESS statement doesn't work on my version 9.1.3 ?! Kind of weird because i find some papers on this statement accross the web... Maybe this statement is no more in use ? I just have one continuous predictor: age. I'll try figured out how to parameterize it better to fit the model. Thanks again. Alexis |