Prev: logistics model.
Next: data manipulation problem
From: Alexis Lelex on 22 Oct 2009 11:21 Hi again, I put out the lift but the curb looks like pretty the same as the roc one. I wonder if we still can interprate odds ratios when predictivity of the model looks bad ?
From: Murphy Choy on 22 Oct 2009 11:25 Hi, If you are doing data mining, it should be fine. For research, I would recommend you to try other models. ------Original Message------ From: Alexis Lelex Sender: SAS(r) Discussion To: SAS-L(a)LISTSERV.UGA.EDU ReplyTo: Alexis Lelex Subject: Re: Quality of logistic regression model Sent: Oct 22, 2009 11:21 PM Hi again, I put out the lift but the curb looks like pretty the same as the roc one. I wonder if we still can interprate odds ratios when predictivity of the model looks bad ? Sent from my BlackBerry Wireless Handheld -- Regards, Murphy Choy Certified Advanced Programmer for SAS V9 Certified Basic Programmer for SAS V9
From: Sigurd Hermansen on 22 Oct 2009 23:42 Alexis: I understand your English far better than you would understand my French, so don't worry about language deficits. We have a greater challenge in understanding the more cryptic language of statistics. The relatively small contributions of covariates to the fit of model to data (e.g, AIC for full model with covariates = 91616.461; without covariates = 83858.175) would worry me as well. The p-values for individual covariates don't mean much. Predictions of an outcome in a large sample will typically explain something. How well a model predicts in samples not used to estimated parameters of a model has much more importance. A c statistic value of 0.715 suggests some predictive value of the model within a sample, but may not carry over to another sample. My 2008 SESUG papers on predictive modeling focus on the c statistic (area under a ROC) and what it fails to take into account. The statistics that you cite indicate that your model is predicting better than chance (not a high standard). I would recommend validating the model by applying it to a new sample, if you have one. S -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Alexis Lelex Sent: Thursday, October 22, 2009 8:53 AM To: SAS-L(a)LISTSERV.UGA.EDU Subject: Quality of logistic regression model Hi, This is my first post here and my english is not well... so i'll do my best to make me understand. I'm modelling a logistic regression on more than 120 000 individuals, and i get some very interesting results with my odds ratios, and all p-values are <0,0001. But some figures in the SAS output make look bad quality of the model: high AIC and SC, low R2 and Tau-a, Hosmer and Lemeshow telling a lack of fit... Here's some part of my output: Statistiques d'ajustement du modèle Coordonnées Coordonnées � l'origine Critères A l'origine Avec Covariables AIC 91616.461 83858.175 SC 91626.215 84043.487 -2 Log L 91614.461 83820.175 R-Square 0.0595 Max-rescaled R-Square 0.1158 Association des probabilités prédites et des réponses observées Percent Concordant 71.2 Somers' D 0.431 Percent Discordant 28.1 Gamma 0.434 Percent Tied 0.7 Tau-a 0.089 Pairs 1666460055 c 0.715 Test d'adéquation d'Hosmer et de Lemeshow Khi 2 DF Pr > Khi 2 78.2312 8 <.0001 Is it possible to make the interpretation of the odds ratios, even though there's a lack of fit and the model isn't predictive ? In other words what conclusion can we take (or not) from a model like this one ? If someone can help me on this one it'll be really great ! Thanks PS: by the way very good SAS forum, i learn a lots of things reading you peoples !
From: Alexis Lelex on 23 Oct 2009 04:06 Hi, I'm working on job statistics, and was modelling the probability to switch on another profession "family". So i'll try with another sample, but it's quite difficult cause our data is limited in time of study. I just find your papers on the c statistics, thanks a lot i'm sure this will help me to increased my understanding. Alexis
From: Alexis Lelex on 23 Oct 2009 04:15
Hi, I'm working on the probability of an unemployed person to switch to another "family" profession in a specific interval of years. So i was thinking of proceeding time model, with use of proc lifetest and phreg, but the results of the second one will be quite the same of the logistic i might think, won't it ? |