Fisher's exact test appropriate here? [SAS]

Prev: automatically reading other formats into SAS
Next: proc fcmp crashes

From: Eli Y. Kling on 8 Jan 2010 18:38

I feel Fishers Exact test is appropriate but probably not powerful
enough.

How about a logistic regression with extra explanatory variables such
as economic-demographic group. If you can lay your hands on detailed
records you might use the data mining technique of balanced sampling
(50% event and 50% non event) to deal with the rear event modelling.

But that aside, I wonder whether even if the difference is
statistically significant whether it is practically significant.

You can turn the question on its head: out of the (17+26)=43,
17/43=39.5% have Medicaid. For H0: Complications explain Medicaid the
one-tail P value is 0.1110 and the two-tail P value is 0.2221 In
social sciences that might be considered significant but you have to
decide.

With Regard,
Eli

On 8 Jan, 20:57, robe...(a)HEALTH.OK.GOV (Robert Feyerharm) wrote:
> I'm comparing various pregnancy & delivery complication rates between the
> Medicaid and Non-Medicaid populations in my State. These rates are often
> quite small (for example, 17/20,833 vs 26/26,602).
>
> There are a number of options available to test for a statistically
> significant difference between two rates. I'm inclined to use Fisher's
> exact test in this situation since it makes no assumptions how the data is
> distributed (normal, Poisson, etc.).
>
> Most epidemiologists use a Poisson approximation to compare rates where
> the numerator is less than 100. Is Fisher's exact test a better method?
>
> Note that for cases like mine where the denominator is large, SAS probably
> resorts to a numerical method to approximate the Fisher's exact test p-
> value (hence the Fisher's exact test p-value may not exactly be "exact").
>
> Thanks,
>
> Robert Feyerharm
> Oklahoma State Department of Health

From: Dale McLerran on 8 Jan 2010 19:51

--- On Fri, 1/8/10, Robert Feyerharm <robertf(a)HEALTH.OK.GOV> wrote:

> From: Robert Feyerharm <robertf(a)HEALTH.OK.GOV>
> Subject: Fisher's exact test appropriate here?
> To: SAS-L(a)LISTSERV.UGA.EDU
> Date: Friday, January 8, 2010, 12:57 PM
> I'm comparing various pregnancy & delivery complication rates between the
> Medicaid and Non-Medicaid populations in my State. These rates are often
> quite small (for example, 17/20,833 vs 26/26,602).
>
> There are a number of options available to test for a statistically
> significant difference between two rates. I'm inclined to use Fisher's
> exact test in this situation since it makes no assumptions how the data
> is distributed (normal, Poisson, etc.).
>
> Most epidemiologists use a Poisson approximation to compare rates where
> the numerator is less than 100. Is Fisher's exact test a better method?
>
> Note that for cases like mine where the denominator is large, SAS
> probably resorts to a numerical method to approximate the Fisher's
> exact test p-value (hence the Fisher's exact test p-value may not
> exactly be "exact").
>
>
> Thanks,
>
> Robert Feyerharm
> Oklahoma State Department of Health
>

Robert,

Let me dispose of the statement about whether Fisher's exact
test is using an approximation due to the large sample size.
For a 2x2 table with row totals R1 and R2, column totals C1
and C2, and cell frequencies f11, f12, f21, and f22, the
Fisher exact test depends on the computation

P = [ ( R1! * R2! * C1! * C2!) / n! ] /
( f11! * f12! * f21! * f22!)

for different arrangements of the cell frequencies fij.
Now, SAS certainly cannot compute all of these factorials
for the size sample which you have. Note, that

X! = GAMMA(X+1)

where GAMMA(u) is the gamma function and that

log(X!) = lGAMMA(X+1)

where lGAMMA is the log gamma function. Now, taking
logarithms, we have

log(P) = log( [ ( R1! * R2! * C1! * C2!) / n! ] /
( f11! * f12! * f21! * f22!) )

= log(R1!) + log(R2!) + log(C1!) + log(C2!) - log(n!) -
log(f11!) - log(f12!) - log(f21!) - log(f22!)

= lgamma(R1+1) + lgamma(R2+1) + lgamma(C1+1) + lgamma(C2+1) -
lgamma(n+1) -
lgamma(f11+1) - lgamma(f12+1) - lgamma(f21+1) - lgamma(f22+1)

Now, what is really important for Fisher's exact test is
not the value of P (or log(P)), but the value of P (log(P))
for the observed table compared to other possible tables
which retain the same marginal frequencies. To the extent
that the computation of log(P) using the lgamma function
retains order, then the computation of Fisher's exact test
is not at all affected by the sample size. I would really
expect that log(P) would at least retain order across all
possible tables which have the specified marginal values.
Thus, the value of the Fisher exact test should not be
compromised at all.

Now, as to whether the Fisher exact p-value is better than
p-values based on distributional assumptions (normal, Poisson),
I would think that it wouldn't much matter for the sample size
that you have here. Certainly, for the values which you
present in your post, the Fisher exact test, chi-square test,
and Poisson model all produce nonsignificant p-values. There
is some discrepancy in p-values for the three methods.
However, since all of the methods indicate that the p-value
is greater than 0.50, any discrepancy is of trivial
importance.

You might have some other variables which you want to test
(or you might not have revealed correct data). As you get
closer to p=0.05, I would place money that the p-values
will become more and more similar. If you are in the
uncomfortable position of having one test where p<0.05
and another test where p>0.05, the interpretation is not
really any different. Using p<0.05 is a rather arbitrary
choice.

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra(a)NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

From: Ryan on 9 Jan 2010 08:29

On Jan 8, 7:51 pm, stringplaye...(a)YAHOO.COM (Dale McLerran) wrote:
> --- On Fri, 1/8/10, Robert Feyerharm <robe...(a)HEALTH.OK.GOV> wrote:
>
>
>
>
>
> > From: Robert Feyerharm <robe...(a)HEALTH.OK.GOV>
> > Subject: Fisher's exact test appropriate here?
> > To: SA...(a)LISTSERV.UGA.EDU
> > Date: Friday, January 8, 2010, 12:57 PM
> > I'm comparing various pregnancy & delivery complication rates between the
> > Medicaid and Non-Medicaid populations in my State. These rates are often
> > quite small (for example, 17/20,833 vs 26/26,602).
>
> > There are a number of options available to test for a statistically
> > significant difference between two rates. I'm inclined to use Fisher's
> > exact test in this situation since it makes no assumptions how the data
> > is distributed (normal, Poisson, etc.).
>
> > Most epidemiologists use a Poisson approximation to compare rates where
> > the numerator is less than 100. Is Fisher's exact test a better method?
>
> > Note that for cases like mine where the denominator is large, SAS
> > probably resorts to a numerical method to approximate the Fisher's
> > exact test p-value (hence the Fisher's exact test p-value may not
> > exactly be "exact").
>
> > Thanks,
>
> > Robert Feyerharm
> > Oklahoma State Department of Health
>
> Robert,
>
> Let me dispose of the statement about whether Fisher's exact
> test is using an approximation due to the large sample size.
> For a 2x2 table with row totals R1 and R2, column totals C1
> and C2, and cell frequencies f11, f12, f21, and f22, the
> Fisher exact test depends on the computation
>
> P = [ ( R1! * R2! * C1! * C2!) / n! ] /
> ( f11! * f12! * f21! * f22!)
>
> for different arrangements of the cell frequencies fij.
> Now, SAS certainly cannot compute all of these factorials
> for the size sample which you have. Note, that
>
> X! = GAMMA(X+1)
>
> where GAMMA(u) is the gamma function and that
>
> log(X!) = lGAMMA(X+1)
>
> where lGAMMA is the log gamma function. Now, taking
> logarithms, we have
>
> log(P) = log( [ ( R1! * R2! * C1! * C2!) / n! ] /
> ( f11! * f12! * f21! * f22!) )
>
> = log(R1!) + log(R2!) + log(C1!) + log(C2!) - log(n!) -
> log(f11!) - log(f12!) - log(f21!) - log(f22!)
>
> = lgamma(R1+1) + lgamma(R2+1) + lgamma(C1+1) + lgamma(C2+1) -
> lgamma(n+1) -
> lgamma(f11+1) - lgamma(f12+1) - lgamma(f21+1) - lgamma(f22+1)
>
> Now, what is really important for Fisher's exact test is
> not the value of P (or log(P)), but the value of P (log(P))
> for the observed table compared to other possible tables
> which retain the same marginal frequencies. To the extent
> that the computation of log(P) using the lgamma function
> retains order, then the computation of Fisher's exact test
> is not at all affected by the sample size. I would really
> expect that log(P) would at least retain order across all
> possible tables which have the specified marginal values.
> Thus, the value of the Fisher exact test should not be
> compromised at all.
>
> Now, as to whether the Fisher exact p-value is better than
> p-values based on distributional assumptions (normal, Poisson),
> I would think that it wouldn't much matter for the sample size
> that you have here. Certainly, for the values which you
> present in your post, the Fisher exact test, chi-square test,
> and Poisson model all produce nonsignificant p-values. There
> is some discrepancy in p-values for the three methods.
> However, since all of the methods indicate that the p-value
> is greater than 0.50, any discrepancy is of trivial
> importance.
>
> You might have some other variables which you want to test
> (or you might not have revealed correct data). As you get
> closer to p=0.05, I would place money that the p-values
> will become more and more similar. If you are in the
> uncomfortable position of having one test where p<0.05
> and another test where p>0.05, the interpretation is not
> really any different. Using p<0.05 is a rather arbitrary
> choice.
>
> Dale
>
> ---------------------------------------
> Dale McLerran
> Fred Hutchinson Cancer Research Center
> mailto: dmclerra(a)NO_SPAMfhcrc.org
> Ph: (206) 667-2926
> Fax: (206) 667-5977
> ---------------------------------------- Hide quoted text -
>
> - Show quoted text -

Dale,

I computed P*log(P) for a 2X2 table with the following cell
frequencies:

f11=8
f12=14
f21=75
f22=32

and obtained the value

P*log(P)=-0.015585312549042

Here are the formulas I used to compute P*log(P) in another stats
program:

---------------

log_p = lngamma(22+1) + lngamma(107+1) + lngamma(83+1) + lngamma
(46+1) - lngamma(129+1) -
lngamma(8+1) - lngamma(14+1) - lngamma(75+1) - lngamma
(32+1)

p_log_p = exp(log_p)*log_p

--------------

If you have the time, could you please tell me what I did incorrectly
and exactly what p_log_p represents?

Thanks,

Ryan

From: xlr82sas on 9 Jan 2010 18:51

On Jan 9, 5:29 am, Ryan <ryan.andrew.bl...(a)gmail.com> wrote:
> On Jan 8, 7:51 pm, stringplaye...(a)YAHOO.COM (Dale McLerran) wrote:
>
>
>
>
>
> > --- On Fri, 1/8/10, Robert Feyerharm <robe...(a)HEALTH.OK.GOV> wrote:
>
> > > From: Robert Feyerharm <robe...(a)HEALTH.OK.GOV>
> > > Subject: Fisher's exact test appropriate here?
> > > To: SA...(a)LISTSERV.UGA.EDU
> > > Date: Friday, January 8, 2010, 12:57 PM
> > > I'm comparing various pregnancy & delivery complication rates between the
> > > Medicaid and Non-Medicaid populations in my State. These rates are often
> > > quite small (for example, 17/20,833 vs 26/26,602).
>
> > > There are a number of options available to test for a statistically
> > > significant difference between two rates. I'm inclined to use Fisher's
> > > exact test in this situation since it makes no assumptions how the data
> > > is distributed (normal, Poisson, etc.).
>
> > > Most epidemiologists use a Poisson approximation to compare rates where
> > > the numerator is less than 100. Is Fisher's exact test a better method?
>
> > > Note that for cases like mine where the denominator is large, SAS
> > > probably resorts to a numerical method to approximate the Fisher's
> > > exact test p-value (hence the Fisher's exact test p-value may not
> > > exactly be "exact").
>
> > > Thanks,
>
> > > Robert Feyerharm
> > > Oklahoma State Department of Health
>
> > Robert,
>
> > Let me dispose of the statement about whether Fisher's exact
> > test is using an approximation due to the large sample size.
> > For a 2x2 table with row totals R1 and R2, column totals C1
> > and C2, and cell frequencies f11, f12, f21, and f22, the
> > Fisher exact test depends on the computation
>
> > P = [ ( R1! * R2! * C1! * C2!) / n! ] /
> > ( f11! * f12! * f21! * f22!)
>
> > for different arrangements of the cell frequencies fij.
> > Now, SAS certainly cannot compute all of these factorials
> > for the size sample which you have. Note, that
>
> > X! = GAMMA(X+1)
>
> > where GAMMA(u) is the gamma function and that
>
> > log(X!) = lGAMMA(X+1)
>
> > where lGAMMA is the log gamma function. Now, taking
> > logarithms, we have
>
> > log(P) = log( [ ( R1! * R2! * C1! * C2!) / n! ] /
> > ( f11! * f12! * f21! * f22!) )
>
> > = log(R1!) + log(R2!) + log(C1!) + log(C2!) - log(n!) -
> > log(f11!) - log(f12!) - log(f21!) - log(f22!)
>
> > = lgamma(R1+1) + lgamma(R2+1) + lgamma(C1+1) + lgamma(C2+1) -
> > lgamma(n+1) -
> > lgamma(f11+1) - lgamma(f12+1) - lgamma(f21+1) - lgamma(f22+1)
>
> > Now, what is really important for Fisher's exact test is
> > not the value of P (or log(P)), but the value of P (log(P))
> > for the observed table compared to other possible tables
> > which retain the same marginal frequencies. To the extent
> > that the computation of log(P) using the lgamma function
> > retains order, then the computation of Fisher's exact test
> > is not at all affected by the sample size. I would really
> > expect that log(P) would at least retain order across all
> > possible tables which have the specified marginal values.
> > Thus, the value of the Fisher exact test should not be
> > compromised at all.
>
> > Now, as to whether the Fisher exact p-value is better than
> > p-values based on distributional assumptions (normal, Poisson),
> > I would think that it wouldn't much matter for the sample size
> > that you have here. Certainly, for the values which you
> > present in your post, the Fisher exact test, chi-square test,
> > and Poisson model all produce nonsignificant p-values. There
> > is some discrepancy in p-values for the three methods.
> > However, since all of the methods indicate that the p-value
> > is greater than 0.50, any discrepancy is of trivial
> > importance.
>
> > You might have some other variables which you want to test
> > (or you might not have revealed correct data). As you get
> > closer to p=0.05, I would place money that the p-values
> > will become more and more similar. If you are in the
> > uncomfortable position of having one test where p<0.05
> > and another test where p>0.05, the interpretation is not
> > really any different. Using p<0.05 is a rather arbitrary
> > choice.
>
> > Dale
>
> > ---------------------------------------
> > Dale McLerran
> > Fred Hutchinson Cancer Research Center
> > mailto: dmclerra(a)NO_SPAMfhcrc.org
> > Ph: (206) 667-2926
> > Fax: (206) 667-5977
> > ---------------------------------------- Hide quoted text -
>
> > - Show quoted text -
>
> Dale,
>
> I computed P*log(P) for a 2X2 table with the following cell
> frequencies:
>
> f11=8
> f12=14
> f21=75
> f22=32
>
> and obtained the value
>
> P*log(P)=-0.015585312549042
>
> Here are the formulas I used to compute P*log(P) in another stats
> program:
>
> ---------------
>
> log_p = lngamma(22+1) + lngamma(107+1) + lngamma(83+1) + lngamma
> (46+1) - lngamma(129+1) -
> lngamma(8+1) - lngamma(14+1) - lngamma(75+1) - lngamma
> (32+1)
>
> p_log_p = exp(log_p)*log_p
>
> --------------
>
> If you have the time, could you please tell me what I did incorrectly
> and exactly what p_log_p represents?
>
> Thanks,
>
> Ryan- Hide quoted text -
>
> - Show quoted text -

Hi,

Dales formula agrees exactly with proc freq and does represent the Two
Tail 'Exact Fischer Test'.

data _null_;

f11=8 ;
f12=14 ;
f21=75 ;
f22=32 ;

n=f11+f12+f21+f22;
r1=f11+f12;
r2=f21+f22;
c1=f11+f21;
c2=f12+f22;

logP = lgamma(R1+1) + lgamma(R2+1) +
lgamma(C1+1) + lgamma(C2+1) -
lgamma(n+1) - lgamma(f11+1) -
lgamma(f12+1) - lgamma(f21+1) -
lgamma(f22+1);

P=exp(logp);
put p=;
prd=p*logp;
put prd=;
run;

P=0.0026221258
PRD=-0.015585313

%macro sigcid(pegevn=4,pegtot=467,pboevn=0,pbotot=461);
data sigcid;
trt='pbo'; evn=1; tot=&pbotot - &pboevn;output;
trt='pbo'; evn=0; tot=&pboevn ;output;
trt='peg'; evn=1; tot=&pegtot - &pegevn;output;
trt='peg'; evn=0; tot=&pegevn ;output;
run;
ods output FishersExact=pvalues;
proc freq data=sigcid;
tables trt*evn/ list chisq riskdiffc exact relrisk;
weight tot;
output out=rsk (keep=_rdif2_ l_rdif2 u_rdif2) chisq riskdiffc
exact relrisk;
run;
data rsk_ci(keep=dif);
merge rsk pvalues(firstobs=2 obs=2) pvalues(firstobs=5
obs=5 rename=nvalue1=Two);
dif='Approx CI '!!put( _rdif2_*100, 7.3) || " (" || put
(l_rdif2*100, 7.3) || ", "
|| put(u_rdif2*100, 7.3) || ") Peg>Pbo Exact Pvalue="!!put
(nvalue1,9.5)
!! ' Two Tail Pvalue=' !! put(Two,9.5);
run;
proc print data=rsk_ci;
run;
%mend sigcid;

%sigcid(pegevn=75,pegtot=107,pboevn=8,pbotot=22);

DIF

Approx CI 33.730 ( 9.096, 58.363) Peg>Pbo Exact Pvalue=
0.00333 Two Tail Pvalue= 0.00262

The two tail value is exactly what Dale was computing. The 2 x 2 test
is a combinatorial problem. Higher dimension cantigency tables are
more problematic. By the way you need SAS 9.2 to get the exact
confidence interval of the risk difference.

There is a lot more on this on my site

http://homepage.mac.com/magdelina/.Public/utl.html

/* T000102 FISHER EXACT TESTS FOR CONTINGENCY
TABLES
/* T000103 DATASTEP INTERACTIVE METHOD FOR BINOMIAL CONFIDENCE
INTERVAL WORKS WITH 0 RESPONDERS MAYO CLINIC
/* T000111 APPROX CONFIDENCE INTERVAL ON RISK DIFFERENCES USING PROC
FREQ - NEED 9.2 FOR EXACT CONFIDENCE INTERVALS
/* T000112 APPROX CONFIDENCE INTERVAL AND EXACT P-VALUE RISK
DIFFERENCES USING PROC FREQ - NEED 9.2 FOR EXACT CONFIDENCE INTERVALS
/* T000113 APPROX CONFIDENCE INTERVAL AND EXACT P-VALUE RISK
DIFFERENCES USING PROC FREQ - AND DALES SAS-L ANALYSIS
/* T000117 EXACT CONFIDENCE INTERVALS USING THE EXACT OPTION IN SAS
PROC FREQ DOES NOT WORK WITH 0 RRESPONDERS
/* T000119 CALCULATE CONFIDENCE INTERVALS FOR THE BINOMIAL PROPORTION
WITH 0 RESPONDERS

From: Dale McLerran on 9 Jan 2010 19:25

--- On Sat, 1/9/10, Ryan <ryan.andrew.black(a)GMAIL.COM> wrote:

> From: Ryan <ryan.andrew.black(a)GMAIL.COM>
> Subject: Re: Fisher's exact test appropriate here?
> To: SAS-L(a)LISTSERV.UGA.EDU
> Date: Saturday, January 9, 2010, 5:29 AM
> On Jan 8, 7:51 pm, stringplaye...(a)YAHOO.COM
> (Dale McLerran) wrote:
> >
> > Robert,
> >
> > Let me dispose of the statement about whether Fisher's exact
> > test is using an approximation due to the large sample size.
> > For a 2x2 table with row totals R1 and R2, column totals C1
> > and C2, and cell frequencies f11, f12, f21, and f22, the
> > Fisher exact test depends on the computation
> >
> > P = [ ( R1! * R2! * C1! * C2!) / n! ] /
> > ( f11! * f12! * f21! * f22!)
> >
> > for different arrangements of the cell frequencies fij.
> > Now, SAS certainly cannot compute all of these factorials
> > for the size sample which you have. Note, that
> >
> > X! = GAMMA(X+1)
> >
> > where GAMMA(u) is the gamma function and that
> >
> > log(X!) = lGAMMA(X+1)
> >
> > where lGAMMA is the log gamma function. Now, taking
> > logarithms, we have
> >
> > log(P) = log( [ ( R1! * R2! * C1! * C2!) / n! ] /
> > ( f11! * f12! * f21! * f22!) )
> >
> > = log(R1!) + log(R2!) + log(C1!) + log(C2!) - log(n!) -
> > log(f11!) - log(f12!) - log(f21!) - log(f22!)
> >
> > = lgamma(R1+1) + lgamma(R2+1) + lgamma(C1+1) + lgamma(C2+1) -
> > lgamma(n+1) -
> > lgamma(f11+1) - lgamma(f12+1) - lgamma(f21+1) -lgamma(f22+1)
> >
> > Now, what is really important for Fisher's exact test is
> > not the value of P (or log(P)), but the value of P (log(P))
> > for the observed table compared to other possible tables
> > which retain the same marginal frequencies. To the extent
> > that the computation of log(P) using the lgamma function
> > retains order, then the computation of Fisher's exact test
> > is not at all affected by the sample size. I would really
> > expect that log(P) would at least retain order across all
> > possible tables which have the specified marginal values.
> > Thus, the value of the Fisher exact test should not be
> > compromised at all.
> >
> > Now, as to whether the Fisher exact p-value is better than
> > p-values based on distributional assumptions (normal, Poisson),
> > I would think that it wouldn't much matter for the sample size
> > that you have here. Certainly, for the values which you
> > present in your post, the Fisher exact test, chi-square test,
> > and Poisson model all produce nonsignificant p-values. There
> > is some discrepancy in p-values for the three methods.
> > However, since all of the methods indicate that the p-value
> > is greater than 0.50, any discrepancy is of trivial
> > importance.
> >
> > You might have some other variables which you want to test
> > (or you might not have revealed correct data). As you get
> > closer to p=0.05, I would place money that the p-values
> > will become more and more similar. If you are in the
> > uncomfortable position of having one test where p<0.05
> > and another test where p>0.05, the interpretation is not
> > really any different. Using p<0.05 is a rather arbitrary
> > choice.
> >
> > Dale
> >
> > ---------------------------------------
> > Dale McLerran
> > Fred Hutchinson Cancer Research Center
> > mailto: dmclerra(a)NO_SPAMfhcrc.org
> > Ph: (206) 667-2926
> > Fax: (206) 667-5977
> > ---------------------------------------- Hide quoted
> text -
> >
> > - Show quoted text -
>
> Dale,
>
> I computed P*log(P) for a 2X2 table with the following
> cell
> frequencies:
>
> f11=8
> f12=14
> f21=75
> f22=32
>
> and obtained the value
>
> P*log(P)=-0.015585312549042
>
> Here are the formulas I used to compute P*log(P) in another
> stats
> program:
>
> ---------------
>
> log_p = lngamma(22+1) + lngamma(107+1) + lngamma(83+1) +
> lngamma(46+1) - lngamma(129+1) -
> lngamma(8+1) - lngamma(14+1) - lngamma(75+1) -
> lngamma(32+1)
>
> p_log_p = exp(log_p)*log_p
>
> --------------
>
> If you have the time, could you please tell me what I did incorrectly
> and exactly what p_log_p represents?
>
> Thanks,
>
> Ryan
>

Ryan,

When I wrote "but the value of P (log(P))", you apparently
interpreted that to mean that we would compute P*log(P).
Previously in the same sentence, I had written "what is
really important for Fisher's exact test is not the value
of P (or log(P))". So, when I wrote "P (log(P))" in the
same sentence, I meant for that to be interpreted as
"P (or log(P))". But I was not really clear on that.

Now, this P that we compute according to the formula specified
above is NOT the Fisher's exact test p-value. Rather, it
is a probability under multinomial sampling of the
particular table with observed f11, f12, f21, and f22
among all tables which have the observed marginal frequencies
R1, R2, C1, and C2. Given those same fixed marginals, there
could be other values f11~, f12~, f21~, and f22~ which
could have been observed.

The Fisher's exact test p-value is obtained by computing P
for the observed table as well as P~ (where P~ is the value
of P computed for f11~, f12~, f21~, and f22~) for all possible
tables having the observed marginal frequencies. We then
compare P against the distribution of P~.

Note, though, that P and log(P) have a monotonic relationship
so that we could also compare log(P) against the distribution
of log(P~). For that matter, P*log(P) and P have a
monotonic relationship. So, the statistic which you have
computed above could be employed to construct the Fisher's
exact test p-value if you compare the observed table value
of P*log(P) against the distribution of P~*log(P~).

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra(a)NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

| Next | Last
Pages: 1 2
Prev: automatically reading other formats into SAS
Next: proc fcmp crashes