From: Dale McLerran on
--- On Tue, 11/10/09, Bijay Tamang <tamangbijay(a)GMAIL.COM> wrote:

> From: Bijay Tamang <tamangbijay(a)GMAIL.COM>
> Subject: Zero-inflated Poisson for correlated data
> To: SAS-L(a)LISTSERV.UGA.EDU
> Date: Tuesday, November 10, 2009, 8:44 AM
> I am trying to analyze my windbreak
> root data using ZIP model.
>
> I have three windbreaks and two trenches within each
> windbreak. 1mx1m trench
> face was divided into 10 cm x 10cm grids. I counted the
> number of
> roots(Nxtot) exiting from each grid and 60 percent of my
> count is zero.
>
> Here is what I did
>
> 1. Sorted the data by windbreak (WBA) , Location (trench),
> depth
> 2. WBA is a categorical variable, so I created dummy
> variables
> 3. Created a new random variable as follows as NLMIXED
> allows only one
> random variable
>
> wba_loc_dep = COMPRESS(wba||"-"||location||"-"||depth);
> wba_loc = COMPRESS(wba||"-"||location);
>
> 1. The following code runs fine but does not have the
> random statement but
> the "inflation probability is only 0.05"
>
> proc nlmixed data=Nx3;
> parms a0=0 a11=0 a12=0 a2=0 a3=0 b0=0 b11=0 b12=0 b2=0
> b3=0;
> logitpi=a0 + a11*WB_dum1 + a12*wb_dum2 + a2*depth1 +
> a3*BD;
> pi=exp(logitpi)/(1+exp(logitpi));
> loglambda=b0 + b11*WB_dum1 + b12*wb_dum2 + b2*depth1
> + b3*BD;
> lambda=exp(loglambda);
> if nxtot=0 then
> ll=log(pi+(1-pi)*exp(-lambda));
> else ll=log(1-pi) +
> Nxtot*log(lambda)-lambda-lgamma(Nxtot+1);
> model Nxtot~general (ll);
> ESTIMATE "inflation probability" pi ;
> ESTIMATE "lambda" lambda ;
> predict pi out=ZIP_pi_preds;
> predict lambda out = ZIP_lambda_preds;
> run;
>
> 2. Now, I added the random statement in the following code
> but gives me the
> following message " The final Hessian matrix is not
> positive definite, and
> therefore the estimated
> covariance matrix is
> not full rank and may be unreliable. The
> variance of some
> parameter estimates
> is zero or some parameters are linearly related
> to other
> parameters.
>
> and "inflation probability is 168E-31"
>
> PROC SORT DATA=nx3;
> BY wba location depth pos;
> TITLE 'ZIP model with RANDOM effect for positions nested
> in
> location*windbreak*depth';
> proc nlmixed data=Nx3;
> parms a0=4.3 a11=-2.8 a12=-2.8 a2=1.4 a3=-4.3 b0=0.91
> b11=0.77 b12=1.1
> s2u=4;
> logitpi=a0 + a11*WB_dum1 + a12*wb_dum2 + a2*depth1 +
> a3*BD;
> pi=exp(logitpi)/(1+exp(logitpi));
> loglambda=b0 + b11*WB_dum1 + b12*wb_dum2 + u;
> lambda=exp(loglambda);
> if nxtot=0 then ll=log(pi+(1-pi)*exp(-lambda));
> else ll=log(1-pi) +
> Nxtot*log(lambda)-lambda-lgamma(Nxtot+1);
> model Nxtot~general (ll);
> RANDOM u ~ normal(0,s2u) SUBJECT=wba_loc_dep OUT=out_Rand;
> ESTIMATE "inflation probability" pi ;
> predict pi out=ZIP_pi_preds;
> predict lambda out = ZIP_lambda_preds;
> run;
>
> 3. If replace "SUBJECT=wba_loc_dep" in the above code
> with
> "SUBJECT=wba_loc" the code runs fine and the "inflation
> probability is
> 0.017"
>
> I believe the inflation probability should be closer to 60.
> Am I doing the
> right analysis? Is this the right way to analyze my data?
>
> thanks in Advance. Bijay
>

Bijay,

Random subjects should be independent of each other. Using
wba_loc_dep as your random subject certainly violate this
assumption. You would expect the number of roots in the
first 10 cm of trench 1/location 1 to be correlated with
the number of roots in the second 10 cm, third 10 cm, ...
of trench 1/location 1.

In truth, your random subjects should probably be specified
as location, with trench within location a random effect.
However, a case might be made for specifying the trench as
independent units. Given the limited number of locations
in your data, this is all the more reasonable.

Now, whether the trench effect on log(lambda) is the same
at all depths might be in question. It would be possible
to fit a model in which trench was the random subject and
the between-trench variance in log(lambda) depended on
depth. You would have to include 10 random effects (u1,
u2, u3, ..., u10), with each of these random effects
associated with a specific depth. You would also have
to allow for covariance between the u{i} effect estimates.
Given the small number of trenches in your investigation,
I don't think that you should try to model variation (and
covariance) in log(lambda) associated with trench depth.
You have only 6 trenches, and the covariance structure
would have 55 elements (although some might be constrained
to be the same).

As for what you regard as a low zero-inflation probability,
remember that the zero-inflation probability is the
excess zero count after the number of zero values determined
by the Poisson distribution are accounted for. That is,
the Poisson distribution itself would lead one to expect
a certain frequency of zero values. The zero-inflation
probability represents zero values above the number that
are expected under the Poisson distribution.

In short, your third model is not unreasonable and is
probably the best that you can do with the limited number
of trenches which you have for your study.

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra(a)NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------