Prev: Sample size estimation for Prescott's test
Next: Trouble Connecting to SAS from Java using TelnetConnectClient via Spawner
From: Dale McLerran on 13 Apr 2007 12:44 --- Jeff Miller <jeffmiller(a)ALPHAPOINT05.NET> wrote: > I don't really have time to get too much into this either right > now...and > have only been skimming these posts. But, I am interested. > > I can say that I had problems with the quasi-Newton in SAS and wound > up > switching to R for that reason. I was able to converge successfully > quite > frequently in a Monte Carlo using the BFGS and Nelder-Mead (in place > of > SAS's quasi-Newton). Yes, it was a bit slower but it was worth it. > They > worked fine for the ZIP, NB-ZIP, Hurdle, and NB-Hurdle. (Of course, > NB-ZIP, > as usual, had the most convergence problems but not anything > ridiculous > (except in one condition but that's a different story)). > > I'm far from being an optimization-expert. Any comments on this that > might > persuade me to try SAS again without having to spend hours > programming stuff > would that ideally be built in the the proc? > > Thanks, > Jeff > Jeff, That is an awfully broad topic to try to address. Let me respond with reference to an example I encountered just this week where someone was having problems fitting a ZIP model employing the NLMIXED procedure. The data had zero values for about 45% of the response. The remaining response values had mean of about 24. The person who requested my help did have a properly specified ZIP model, but parameters of the Poisson component just were not iterating. Let me note, too, that there were two predictor variables, and these predictor variables each had values which were as high as 90 to 100. Now, if you do not specify initial starting values for the parameters of the model, then the NLMIXED procedure will use starting values of 1 for every parameter. If you have a log link function for the Poisson response, then you present NLMIXED with an initial model as follows: eta = 1 + X1 + X2; lambda = exp(eta); If eta is close to 200, then we have exp(eta) being outrageously large. NLMIXED just could not iterate on the Poisson component from the default initial parameters. Any starting values which produce very large lambda would cause problems. Actually, the same is true for the excess zero probability part of the model. There you have eta_zip = 1 + X1 + X2; p_zip = exp(eta_zip) / (1 + exp(eta_zip)); so that if you start with very large values for eta_zip, you estimate the probability of the excess zero component as being 1. If all observations have large p_zip, you may not be able to iterate away from your initial starting values. So, what is the solution? A good solution would be to have a ZIP --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra(a)NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 --------------------------------------- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
From: Dale McLerran on 13 Apr 2007 13:19 --- Jeff Miller <jeffmiller(a)ALPHAPOINT05.NET> wrote: > I don't really have time to get too much into this either right > now...and > have only been skimming these posts. But, I am interested. > > I can say that I had problems with the quasi-Newton in SAS and wound > up > switching to R for that reason. I was able to converge successfully > quite > frequently in a Monte Carlo using the BFGS and Nelder-Mead (in place > of > SAS's quasi-Newton). Yes, it was a bit slower but it was worth it. > They > worked fine for the ZIP, NB-ZIP, Hurdle, and NB-Hurdle. (Of course, > NB-ZIP, > as usual, had the most convergence problems but not anything > ridiculous > (except in one condition but that's a different story)). > > I'm far from being an optimization-expert. Any comments on this that > might > persuade me to try SAS again without having to spend hours > programming stuff > would that ideally be built in the the proc? > > Thanks, > Jeff > Jeff, That is an awfully broad topic to try to address. Let me respond with reference to an example I encountered just this week where someone was having problems fitting a ZIP model employing the NLMIXED procedure. The data had zero values for about 45% of the response. The remaining response values had mean of about 24. The person who requested my help did not initially code the ZIP model correctly. But that can be only part of the problem. The other part of the problem can be parameter initilization which and values of the predictor variables. These data had two predictor variables and each had values which were as high as 90 to 100. Now, if you do not specify initial starting values for the parameters of the model, then the NLMIXED procedure will use starting values of 1 for every parameter. If you have a log link function for the Poisson response, then you present NLMIXED with an initial model as follows: eta = 1 + X1 + X2; lambda = exp(eta); If eta is close to 200, then we have exp(eta) being outrageously large. NLMIXED just could not iterate on the Poisson component from the default initial parameters. Any starting values which produce very large lambda would cause problems. Actually, the same is true for the excess zero probability part of the model. There you have eta_zip = 1 + X1 + X2; p_zip = exp(eta_zip) / (1 + exp(eta_zip)); so that if you start with very large values for eta_zip, you estimate the probability of the excess zero component as being 1. If all observations have large p_zip, you may not be able to iterate away from your initial starting values. So, what is the solution? A good solution would be to have a ZIP package where the parameters for the covariates are all initialized to zero. You might also separate the zero values from the non-zero values, compute a mean for the non-zero values and the probability of zero values, and then initialize the intercept for the poisson component as log(mean(Y>0)) and the intercept for the zero probability component as log(P(0)/(1-P(0))). Also, with a packaged solution, the user would not misspecify the model construction. When you have a packaged program for estimating ZIP models, all this work may be done for you under the hood. The folks who have put together the program have probably thought about these issues ahead of time. But these issues are often specific to the type of model which is being fit. So, I don't know that there is any good generalization that I - or anyone else - can write about model parameterization. You have to have some idea how the model behaves in order to do a good job coding a general solution package. In SAS, a general solution package for something like a ZIP model would involve, then, writing a macro which would do some preprocessing of the data. For the ZIP model, we would initialize all slope coefficient parameters to be zero. Intercept parameters would be initialized based on your data preprocessing. Only then would you start the ZIP estimation employing the NLMIXED procedure. The ZIP model would be constructed after the response and predictor variables are named. If someone wrote such a macro, I think you would find that it would work every bit as well as routines written for other statistical packages. Does this address your question? Dale p.s. Sorry about the early post which occurred when my fingers slipped and hit the wrong keystroke. That is the second time in the past week that has happened to me! --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra(a)NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 --------------------------------------- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
From: Hans Skaug on 13 Apr 2007 15:17
Jeff and SAS-list, Two years ago I made a post to the list (attached at the end) stating that I intended to compare NLMIXED and AD Model Builder. The work has not been completed, mainly due to the fact that I am not a skilled SAS programmer. The comparison is an interesting one because we are not talking about plain numerical optimization. A quadrature rule is applied to the user specified likelihood, and this opens for plenty of numerical difficulties. I will be grateful to receive NLMIXED code for problems considered to be difficult by people. Regards, hans >I don't really have time to get too much into this either right now...and >have only been skimming these posts. But, I am interested. > >I can say that I had problems with the quasi-Newton in SAS and wound up >switching to R for that reason. I was able to converge successfully quite >frequently in a Monte Carlo using the BFGS and Nelder-Mead (in place of >SAS's quasi-Newton). Yes, it was a bit slower but it was worth it. They >worked fine for the ZIP, NB-ZIP, Hurdle, and NB-Hurdle. (Of course, NB- ZIP, >as usual, had the most convergence problems but not anything ridiculous >(except in one condition but that's a different story)). > >I'm far from being an optimization-expert. Any comments on this that might >persuade me to try SAS again without having to spend hours programming stuff >would that ideally be built in the the proc? > >Thanks, >Jeff 2005 post: Dear SAS-list, I am planning to conduct a comparison of SAS proc NLMIXED and the software ADMB-RE: http://otter-rsch.com/admbre/admbre.html To my knowledge these two software packages are the only ones that allow flexibility in model formulation for nonlinear mixed models. Both packages use adaptive quadrature to calculate the marginal likelihood, so they should be directly comparable. My goal is to make comparison wrt 1) computational speed and 2) numerical accuracy. The plan is to publish the results in some form. I will do the ADMB-RE part, while I am searching for somebody to do the NLMIXED part. We will need to decide on which datasets to use (real and simulated). There should be no overlap with: http://multilevel.ioe.ac.uk/softrev/index.html |