From: Lily on
Hi All,

I am using GENMOD for medicinal research for a M.Sc. I have
differential white blood cell count data: for example the number of
eosinophils per 100 white blood cells (WBC) observed on blood smears.
I want to know the effects of 2 medications on these counts (class
effects). Note that the number of WBCs per observation are not always
the same, thus it's been suggested that I use a binomial distribution,
and my model looks something like this:
r/n = P C P*C / dist=binomial type3 scale=d
where r = No eosinophils, n = No WBC, P = 1 treatment with 3 levels, C
= another treatment with 2 levels.
The issue is this: I get very high devance/df, which, albeit, may be
explained by exclusion of very influential factors from the model.
However, frequently, negative binomial distributions describe blood
cell ratios well, so I tried the following model and got MUCH lower
deviance/df (i.e. close to 1): r = P C P*C / dist=nb type3 scale=d
offset=n;

Can anyone tell me whether the latter model is appropriate; perhaps
with references? I COULD plot the cumulative probability from each
model and compare them to binomial and negative binomial distributions
to visualize how well they match, but I can't find the SAS code to do
this. Maybe someone has a better idea to confirm which model fits
best?

Thanks VERY much!
From: Shawn Haskell on
On Apr 13, 12:00 pm, Lily <lpa...(a)hotmail.com> wrote:
> Hi All,
>
> I am using GENMOD for medicinal research for a M.Sc.  I have
> differential white blood cell count data: for example the number of
> eosinophils per 100 white blood cells (WBC) observed on blood smears.
> I want to know the effects of 2 medications on these counts (class
> effects).  Note that the number of WBCs per observation are not always
> the same, thus it's been suggested that I use a binomial distribution,
> and my model looks something like this:
> r/n = P C P*C / dist=binomial type3 scale=d
> where r = No eosinophils, n = No WBC, P = 1 treatment with 3 levels, C
> = another treatment with 2 levels.
> The issue is this: I get very high devance/df, which, albeit, may be
> explained by exclusion of very influential factors from the model.
> However, frequently, negative binomial distributions describe blood
> cell ratios well, so I tried the following model and got MUCH lower
> deviance/df (i.e. close to 1):  r = P C P*C / dist=nb type3 scale=d
> offset=n;
>
> Can anyone tell me whether the latter model is appropriate; perhaps
> with references?  I COULD plot the cumulative probability from each
> model and compare them to binomial and negative binomial distributions
> to visualize how well they match, but I can't find the SAS code to do
> this.  Maybe someone has a better idea to confirm which model fits
> best?
>
> Thanks VERY much!

Hi Lily, you haven't told us what your eosinophil counts look like,
but yes, the NegBin pdf is often the best for modeling count data
becasue the Poisson assumes that the variance and mean are equal - a
parsimonious model if it fits, but the variance is often larger than
the mean, causing overdispersion, which naturally leads to the NegBin
pdf. Given that your WBC counts are different, it is approriate to
use that count (log-transformed I believe) as your offset term. You
could use some literature search engine to find many published studies
using the NegBin pdf to model count data - I did so once for counts of
caribou. Sounds like you're on the right track - good luck. Shawn