Quality of logistic regression model [SAS]

Prev: logistics model.
Next: data manipulation problem

From: Shawn Haskell on 23 Oct 2009 10:58

On Oct 23, 4:15 am, lele...(a)HOTMAIL.FR (Alexis Lelex) wrote:
> Hi,
>
> I'm working on the probability of an unemployed person to switch to another
> "family" profession in a specific interval of years.
> So i was thinking of proceeding time model, with use of proc lifetest and
> phreg, but the results of the second one will be quite the same of the
> logistic i might think, won't it ?

if you are interested in the risk or probability that the event will
occur, then PHREG could be a way to go with time-to-event data. if
you are interested in factors that affect timing of the event then
consider LIFEREG for an accelerated failure-time model.

If I read your original post correctly, your model with covariates has
a much reduced AIC - this indicates your covariates are useful. Not
sure how you are getting an r-square value from a likelihood-based
model (i.e., not least squares) - didn't think SAS did such a thing.
Not all models have to be predictive - really depends on your
objectives - "significant" explanatory variables can represent real
discovery about a system even though much uneplained variation still
exists. In many cases though, predictive power is what folks are
trying to achieve - we don't really know what you are trying to
achieve.

From: Dale McLerran on 26 Oct 2009 02:02

Alexis,

You don't indicate how many observations are in your
data set. But my guess is that you have a large number
of observations. The value of the AIC statistic increases
with your sample size. You really cannot interpret the
AIC value directly.

What is more important to consider is whether the AIC
shows a significant decrease when comparing the model
with an intercept only and the model with covariates.
Since you are estimating only 19 parameters (including
the intercept) and the value of -2LL decreases by
approximately 7750, your model is doing a fairly good
job of predicting the response.

I would further note that the value of R^2 for a binary
response can be rather difficult to interpret. However,
an R^2 of 0.1158 is really pretty decent. Also, an
AUC value of 0.715 indicates a fairly decent model.

You are correct, however, that the Hosmer-Lemeshow
statistic indicates that at least one of your
continuous covariates is not parameterized as well
as it could be. You might want to fit your model
using the GENMOD procedure and use the ASSESS
statement to determine better parameterizations of
the continuous predictors.

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra(a)NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

--- On Thu, 10/22/09, Alexis Lelex <lelexos(a)HOTMAIL.FR> wrote:

> From: Alexis Lelex <lelexos(a)HOTMAIL.FR>
> Subject: Quality of logistic regression model
> To: SAS-L(a)LISTSERV.UGA.EDU
> Date: Thursday, October 22, 2009, 5:52 AM
> Hi,
>
> This is my first post here and my english is not well... so
> i'll do my best to make me understand.
> I'm modelling a logistic regression on more than 120 000
> individuals, and i get some very interesting results with
> my odds ratios, and all p-values are <0,0001.
> But some figures in the SAS output make look bad quality of
> the model:
> high AIC and SC, low R2 and Tau-a, Hosmer and Lemeshow
> telling a lack of fit...
> Here's some part of my output:
>
> Statistiques d'ajustement du modèle
>
> Coordonnées Coordonnées � l'origine
> Critères A l'origine Avec Covariables
> AIC 91616.461 83858.175
> SC 91626.215 84043.487
> -2 Log L 91614.461 83820.175
>
>
> R-Square 0.0595 Max-rescaled R-Square 0.1158
>
>
> Association des probabilités prédites et des réponses observées
>
> Percent Concordant 71.2 Somers' D 0.431
> Percent Discordant 28.1 Gamma 0.434
> Percent Tied 0.7 Tau-a 0.089
> Pairs 1666460055 c 0.715
>
>
> Test d'adéquation d'Hosmer et de Lemeshow
>
> Khi 2 DF Pr > Khi 2
>
> 78.2312 8 <.0001
>
>
> Is it possible to make the interpretation of the odds
> ratios, even though
> there's a lack of fit and the model isn't predictive ?
> In other words what conclusion can we take (or not) from a
> model like this
> one ?
>
> If someone can help me on this one it'll be really great !
>
> Thanks
>
> PS: by the way very good SAS forum, i learn a lots of
> things reading you
> peoples !
>

From: Alexis Lelex on 26 Oct 2009 07:23

thanks for the answer,

yes i'll try proc phreg, but before i'm looking to produce the c statistic
for it, i find this document which explain pretty well the way to do it
"Fitting Cox Model Using PROC PHREG and Beyond in SAS" Lea Liu, Sandy
Forman, Bruce Barton
Maryland Medical Research Institute, Baltimore, Maryland, USA

it's possible to get a r value for proc logistic in sas, but i don't think
it's the least squares, even it's might be similar in the interpretation, i
don't know i'm quite lost on the subject...

my goal it's not to achieve a model with strong predictive power, but i've
thought it's not possible to make interpretation of odds ratio's results
without good predictivity
i hope i was wrong in a certain way

Alexis

From: Alexis Lelex on 26 Oct 2009 09:23

you don't indicate how many observations are in your data set. But my guess
is that you have a large number of observations. The value of the AIC
statistic increases with your sample size. You really cannot interpret the
AIC value directly.

What is more important to consider is whether the AIC shows a significant
decrease when comparing the model with an intercept only and the model with
covariates. Since you are estimating only 19 parameters (including the
intercept) and the value of -2LL decreases by approximately 7750, your model
is doing a fairly good job of predicting the response.

I would further note that the value of R^2 for a binary response can be
rather difficult to interpret. However, an R^2 of 0.1158 is really pretty
decent. Also, an AUC value of 0.715 indicates a fairly decent model.

You are correct, however, that the Hosmer-Lemeshow statistic indicates that
at least one of your continuous covariates is not parameterized as well as
it could be. You might want to fit your model using the GENMOD procedure and
use the ASSESS statement to determine better parameterizations of the
continuous predictors.

Hi Dale,

at first thank you for answering.

I got a few more than 120 000 observations in my data set.
I tried the GENMOD procedure, but it seems that the ASSESS statement doesn't
work on my version 9.1.3 ?! Kind of weird because i find some papers on this
statement accross the web... Maybe this statement is no more in use ?
I just have one continuous predictor: age. I'll try figured out how to
parameterize it better to fit the model.

Thanks again.

Alexis

First | Prev |
Pages: 1 2 3
Prev: logistics model.
Next: data manipulation problem