From: VJ on 13 Mar 2010 14:04 Hello All, Apologize in advance for the long mail, but I guess more information, the better the response. I am analyzing count data which has a 40 % (?high) proportion of zeroes and remaining data ranges from 1-100 number of events in the observation period. The response variable is the number of events a patient experiences in a fixed observation period after receiving either of 2 different treatments (parallel group). So I have just one record per individual where the number of events is recorded from time 0 to t hours, with bunch of predictors for every individual.\ After failing to fit the data well with a regular poisson and negative binomial, I shifted to analyze the data with a ZIP and a poisson- hurdle (logit-poisson) model. There was definite improvement in the model fit in terms of drop in -2 log likelihood /AIC values. Between the ZIP and the hurdle model (fitting an intercept only model), the intercept from the logit part of the ZIP model was not significant whereas that from the hurdle is significant, even though the former had a lower -2LL and AIC, which I dont understand. The logit part of the hurdle model (code adapted from a discussion post by Dale and Wensui in May 2007) is giving me similar estimates as the probability of zero P(y=0) from a logistic model, as expected. However, the zero truncated linear function for poisson probability is not being explained by the full set of predictors in hand. I determined this by plotting the predicted lambda (lambda hat) vs. the observed seizure count for each individual, and you should expect it fall on a straight line, which it did not. I am not sure if this is the right diagnostic to look at. I tested for overdispersion using the formula of Cameron and Trivedi by checking the significance of the regression of dep on predicted without an intercept. dep = ((observed - predicted) ** 2 - observed) / predicted; model dep = predicted / noint; /* FIT A OLS REGRESSION WITHOUT INTERCEPT */ The above regression implied presence of overdispersion. So, I introduced a random effect (eta1) into the poisson part of the model, even though there is one only one observation per individual. (Motivated by this SAS example http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/statug_nlmixed_sect042.htm ; surprisingly the results in the example show that the most of the parameters are not significant or am I interpreting this incorrectly?). Now the diagnostic plot that I just talked about shows observed vs predicted passing through the line of identity.Can I include this random effect in my case? Few questions that come into mind; 1. Do I Have the right approach for model diagnosis? 2. What is the interpretation of eta1 (both in the code below and in the SAS example and how is it different from k in NB)? Is it a random effect adjusting for overdispersion at an individual level? 3. What kind of model fit plots (diagnostics) should we be looking at when we deal with such models? Finally here is the code I used , pretty much similar to what Dale posted earlier (Thanks Dale); proc nlmixed data=radarscap tech=newrap maxiter=2000 maxfunc=2000 ; /* Linear function for zero-probability model */ eta_0 = b0_0+a1*trt; exp_eta_0 = exp(eta_0); p_0 = exp_eta_0/(1 + exp_eta_0); /* Linear function for Poisson probability model */ eta_P = bP_0+b2*fourhrs+eta1; mu_p = exp(eta_P); pYyP = exp(-mu_P) * (mu_p**tsz) / fact(tsz); pY0P = exp(-mu_p); if tsz=0 then loglike = eta_0 - log(1 + exp_eta_0); else loglike = log(1 - p_0) - mu_P + tsz*eta_P - lgamma(tsz+1) - log(1 - pY0P); model tsz ~ general(loglike); random eta1 ~ normal(0, sd1*sd1) subject=id; predict eta1 out=eta1 (rename=(pred=eta1)); predict mu_p out=lambda (rename=(pred=lambda)); predict p_0 out=p0 (rename=(pred=p0)); run; quit; Thanks all, VJ
From: Eli Y. Kling on 15 Mar 2010 04:16 Sounds like a very interesting challenge. Your question is very well described and merits an answer. I would be very interested in what others will offer as a response. As to my small contribution: a. I think your gut feeling is correct. If you have 40% zeros than you would expect the relevant modelling component to be significant. As you are finding it hard to fit a meaningful model I would suggest a back to basics approach: 1. Divide the data by some meaningful strata such as time or centre and re-analyze. 2. Look out for extreme values. 3. Try just modelling the probability of response/non-response. You might find a confounded variable or an ill defined one. 4. Check out our code step by step you might have done something in the data-management you did not intend. For instance are all missing values replaced with a zero? 5. The last point suggest another possibility: maybe all the non- responders (zeros) have a missing value for one of the independent variables. This will cause all the record to be dropped from the analysis and result in an insignificant Zero effect. b. Another possible approached is balanced sampling where you take equal number of observations from the responders and non-responders. This will most likely reduce the noise but you must remember to recalibrate the predicted probabilities with the prior sampling ratios.
|
Pages: 1 Prev: How to condense multiple observations Next: Credit Scoring using SAS EM |